system-level dependability issues

Nowadays, the necessity of enforcing the correct system behavior has considerably increased with respect to the past due to the technological progress of the last years (e.g., technological scaling or noise margin reduction), that is leading to the realization of devices more sensible to faults, in particular to soft errors, transient faults mainly caused by environmental phenomena (such as radiations or atomic particle impact). Furthermore, fault incidence has considerably increased not only in harsh space or in the high atmosphere, but also at ground level [Nor1996]. This situation, particularly hazardous in safety-critical systems, is becoming serious also for common applications when considering the embedded systems’ pervasiveness in today’s life (e.g., personal mobile devices, mobile phones or household appliances) and in systems whose misbehavior may be threatening for the people or the environment (e.g.: automotive cruise control units or medical testing systems). As a result, dependability assumes today a role of main driver in embedded system design, at the same level of classical parameters (performance and power).

In this scenario, the research activity proposes an enhancement of the system-level synthesis flow to introduce reliability awareness with respect to transient faults. The approach allows the designer to specify, for each portion of the application, the reliability requirements to be fulfilled (fault detection vs. tolerance) and then explores the solution space, taking into account not only the classical metrics (e.g., performance or power) but also the reliability-related ones, to implement a cost effective reliable system.
The methodology aims at selecting the appropriate technique (or set of techniques) to be applied, possibly exploiting also the fault management features that may be provided by the target architecture. Moreover, it is based on the adoption of classical techniques and an innovative strategy to apply them, to reduce the overheads.
We propose a system-level synthesis step extended to first explore the adoption of hardening techniques that, given the initial task graph and the user’s reliability requirements, introduces redundancies and mapping constraints on the available resources. If the architecture resources offer fault detection/tolerance mechanisms, they are taken into account and exploited. Then, the reliability-aware task graph is implemented by means of a classical mapping and scheduling approach to obtain the hardened implementation.

The major contributions of the proposed methodology and companion framework are:

  • allowing the designer to express different fault management requirements for different parts of the application, for limiting
    the overall implementation cost;
  • availability of a customizable and easily-extendible set of hardware and software techniques for the achievement of the
    desired reliability;
  • application of the reliability-oriented techniques not only to the single tasks but also to group of them, to reduce overheads
    without affecting coverage;
  • exploitation of the fault management features, if available, of the processing units within the given platform;
  • transparency of the reliability-related stage with respect to the classical design flow, so that, should the designer not require any reliability feature, a traditional system-level synthesis process is performed.

Recent publications:

  • [TC2013]
    C. Bolchini, A. Miele, “Reliability-driven System-level Synthesis for Mixed-Critical Embedded Systems,” in IEEE Trans. on Computers, Vol. 62, No. 12, Dec. 2013, doi: http://dx.doi.org/10.1109/TC.2012.226
  • [DFT2011]
    C. Bolchini, A. Miele, “An Application-Level Dependability Analysis Framework for Embedded Systems,” in Proc. IEEE Intl Symp. Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT, pp. 171-178, Oct 2011, doi: http://dx.doi.org/10.1109/DFT.2011.25
  • [GLVLSI2011]
    C. Bolchini, A. Miele, C. Pilato, “Combined architecture and hardening techniques exploration for reliable embedded system design,” in Proc. ACM Great Lakes Symposium on VLSI, pp. 301-306, May 2011, doi http://doi.acm.org/10.1145/1973009.1973069

Dependable Embedded Systems — Design & Analysis methodologies and tools