Paper available
The paper entitled “Proposal of an Adaptive Fault Tolerance Mechanism to Tolerate Intermittent Faults in RAM”, written by J.-Carlos Baraza-Calvo, Joaquín Gracia-Morán, Luis-J. Saiz-Adalid, Daniel Gil-Tomás y Pedro-J. Gil-Vicente is available at Electronics journal.
Abstract: Due to transistor shrinking, intermittent faults are a major concern in current digital systems. This work presents an adaptive fault tolerance mechanism based on error correction codes (ECC), able to modify its behavior when the error conditions change without increasing the redundancy. As a case example, we have designed a mechanism that can detect intermittent faults and swap from an initial generic ECC to a specific ECC capable of tolerating one intermittent fault. We have inserted the mechanism in the memory system of a 32-bit RISC processor and validated it by using VHDL simulation-based fault injection. We have used two (39, 32) codes: a single error correction–double error detection (SEC–DED) and a code developed by our research group, called EPB3932, capable of correcting single errors and double and triple adjacent errors that include a bit previously tagged as error-prone. The results of injecting transient, intermittent, and combinations of intermittent and transient faults show that the proposed mechanism works properly. As an example, the percentage of failures and latent errors is 0% when injecting a triple adjacent fault after an intermittent stuck-at fault. We have synthesized the adaptive fault tolerance mechanism proposed in two types of Field Programmable Gate Arrays (FPGAs): non-reconfigurable and partially reconfigurable. In both cases, the overhead introduced is affordable in terms of hardware, time and power consumption.
Paper accepted at IEEE Transactions on Dependable and Secure Computing
The paper entitled “A Multi-criteria Analysis of Benchmark Results With Expert Support for Security Tools” written by Miquel Martínez, Juan-Carlos Ruiz, Nuno Antunes, David de Andrés and Marco Vieira has been accepted at IEEE Transactions on Dependable and Secure Computing journal.
Abstract. The benchmarking of security tools is endeavored to determine which tools are more suitable to detect system vulnerabilities or intrusions. The analysis process is usually oversimplified by employing just a single metric out of the large set of those available. Accordingly, the decision may be biased by not considering relevant information provided by neglected metrics. This paper proposes a novel approach to take into account several metrics, different scenarios, and the advice of multiple experts. The proposal relies on experts quantifying the relative importance of each pair of metrics towards the requirements of a given scenario. Their judgments are aggregated using group decision making techniques, and pondered according to the familiarity of experts with the metrics and scenario, to compute a set of weights accounting for the relative importance of each metric. Then, weight-based multi-criteria-decision-making techniques can be used to rank the benchmarked tools. The usefulness of this approach is showed by analyzing two different sets of vulnerability and intrusion detection tools from the perspective of multiple/single metrics and different scenarios.
Paper accepted at Electronics Journal
The paper entitled “Proposal of an Adaptive Fault Tolerance Mechanism to Tolerate Intermittent Faults in RAM”, written by J.-Carlos Baraza-Calvo, Joaquín Gracia-Morán, Luis-J. Saiz-Adalid, Daniel Gil-Tomás y Pedro-J. Gil-Vicente has been accepted at Electronics journal.
Abstract: Due to transistor shrinking, intermittent faults are a major concern in current digital systems. This work presents an adaptive fault tolerance mechanism based on error correction codes (ECC), able to modify its behavior when the error conditions change without increasing the redundancy. As a case example, we have designed a mechanism that can detect intermittent faults and swap from an initial generic ECC to a specific ECC capable of tolerating one intermittent fault. We have inserted the mechanism in the memory system of a 32-bit RISC processor and validated it by using VHDL simulation-based fault injection. We have used two (39, 32) codes: a single error correction–double error detection (SEC–DED) and a code developed by our research group, called EPB3932, capable of correcting single errors and double and triple adjacent errors that include a bit previously tagged as error-prone. The results of injecting transient, intermittent, and combinations of intermittent and transient faults show that the proposed mechanism works properly. As an example, the percentage of failures and latent errors is 0% when injecting a triple adjacent fault after an intermittent stuck-at fault. We have synthesized the adaptive fault tolerance mechanism proposed in two types of Field Programmable Gate Arrays (FPGAs): non-reconfigurable and partially reconfigurable. In both cases, the overhead introduced is affordable in terms of hardware, time and power consumption.
Paper accepted at Electronics journal
The paper entitled “Reducing the Overhead of BCH Codes: New Double Error Correction Codes”, authored by Luis-J. Saiz-Adalid, Joaquín Gracia-Morán, Daniel Gil-Tomás, J.-Carlos Baraza-Calvo and Pedro-J. Gil-Vicente has been published at Electronics journal.
Abstract
The Bose-Chaudhuri-Hocquenghem (BCH) codes are a well-known class of powerful error correction cyclic codes. BCH codes can correct multiple errors with minimal redundancy. Primitive BCH codes only exist for some word lengths, which do not frequently match those employed in digital systems. This paper focuses on double error correction (DEC) codes for word lengths that are in powers of two (8, 16, 32, and 64), which are commonly used in memories. We also focus on hardware implementations of the encoder and decoder circuits for very fast operations. This work proposes new low redundancy and reduced overhead (LRRO) DEC codes, with the same redundancy as the equivalent BCH DEC codes, but whose encoder, and decoder circuits present a lower overhead (in terms of propagation delay, silicon area usage and power consumption). We used a methodology to search parity check matrices, based on error patterns, in order to design the new codes. We implemented and synthesized them, and compared their results with those obtained for the BCH codes. Our implementation of the decoder circuits achieved reductions between 2.8% and 8.7% in the propagation delay, between 1.3% and 3.0% in the silicon area, and between 15.7% and 26.9% in the power consumption. Therefore, we propose LRRO codes as an alternative for protecting information against multiple errors.
Paper available at IEEE Access
The paper entitled “Ultrafast Codes for Multiple Adjacent Error Correction and Double Error Detection”, authored by Luis-J. Saiz-Adalid, Joaquín Gracia-Morán, Daniel Gil-Tomás, J.-Carlos Baraza-Calvo and Pedro-J. Gil-Vicente is available at IEEE Access.
Paper accepted at IEEE Access
The paper entitled “Ultrafast Codes for Multiple Adjacent Error Correction and Double Error Detection”, authored by Luis-J. Saiz-Adalid, Joaquín Gracia-Morán, Daniel Gil-Tomás, J.-Carlos Baraza-Calvo and Pedro-J. Gil-Vicente has been accepted at IEEE Access.
Abstract
Reliable computer systems employ error control codes (ECCs) to protect information from errors. For example, memories are frequently protected using single error correction-double error detection (SEC-DED) codes. ECCs are traditionally designed to minimize the number of redundant bits, as they are added to each word in the whole memory. Nevertheless, using an ECC introduces encoding and decoding latencies, silicon area usage and power consumption. In other computer units, these parameters should be optimized, and redundancy would be less important. For example, protecting registers against errors remains a major concern for deep sub-micron systems due to technology scaling. In this case, an important requirement for register protection is to keep encoding and decoding latencies as short as possible. Ultrafast error control codes achieve very low delays, independently of the word length, increasing the redundancy. This paper summarizes previous works on Ultrafast codes (SEC and SEC-DED), and proposes new codes combining double error detection and adjacent error correction. We have implemented, synthesized and compared different Ultrafast codes with other state-of-the-art fast codes. The results show the validity of the approach, achieving low latencies and a good balance with silicon area and power consumption.
Paper available at Electronics Journal
The paper entitled “Fault Modeling of Graphene Nanoribbon FET Logic Circuits”, written by D. Gil-Tomàs, J. Gracia-Morán, L.J. Saiz-Adalid and P.J. Gil-Vicente, and published by Electronics Journal, is now available here.
Paper accepted at Electronics Journal
The paper entitled “Fault Modeling of Graphene Nanoribbon FET Logic Circuits”, written by D. Gil-Tomàs, J. Gracia-Morán, L.J. Saiz-Adalid and P.J. Gil-Vicente has been accepted for publication at Electronics Journal.
Abstract:
Due to the increasing defect rates in highly scaled complementary metal–oxide–semiconductor (CMOS) devices, and the emergence of alternative nanotechnology devices, reliability challenges are of growing importance. Understanding and controlling the fault mechanisms associated with new materials and structures for both transistors and interconnection is a key issue in novel nanodevices. The graphene nanoribbon field-effect transistor (GNR FET) has revealed itself as a promising technology to design emerging research logic circuits, because of its outstanding potential speed and power properties. This work presents a study of fault causes, mechanisms, and models at the device level, as well as their impact on logic circuits based on GNR FETs. From a literature review of fault causes and mechanisms, fault propagation was analyzed, and fault models were derived for device and logic circuit levels. This study may be helpful for the prevention of faults in the design process of graphene nanodevices. In addition, it can help in the design and evaluation of defect- and fault-tolerant nanoarchitectures based on graphene circuits. Results are compared with other emerging devices, such as carbon nanotube (CNT) FET and nanowire (NW) FET.
Paper published at Computiong journal
The paper entitled “Simulating the effects of logic faults in implementation-level VITAL-compliant models” authored by Ilya Tuzov, David de Andrés and Juan Carlos Ruiz has been publised in Computing journal.
Cite as: I. Tuzov, D. de Andrés, J.-C. Ruiz, “Simulating the effects of logic faults in implementation-level VITAL-compliant models”, Computing, Vol. 101, No. 2, 2019, pp. 77–96, ISSN: 0010-485X, DOI: https://doi.org/10.1007/s00607-018-0651-4.
Abstract:
Simulation-based fault injection is a well-known technique to assess the dependability of hardware designs specified using hardware description languages (HDL). Although logic faults are usually introduced in models defined at the register transfer level (RTL), most accurate results can be obtained by considering implementation-level ones, which reflect the actual structure and timing of the circuit. These models consist of a list of interconnected technology-specific components (macrocells), provided by vendors and annotated with post-place-and-route delays. Macrocells described in the very high speed integrated circuit HDL (VHDL) should also comply with the VHDL initiative towards application specific integrated circuit libraries (VITAL) standard to be interoperable across standard simulators. However, the rigid architecture imposed by VITAL makes that fault injection procedures applied at RTL cannot be used straightforwardly. This work identifies a set of generic operations on VITAL-compliant macrocells that are later used to define how to accurately simulate the effects of common logic fault models. The generality of this proposal is supported by the definition of a platform-specific fault procedure based on these operations. Three embedded processors, implemented using the Xilinx’s toolchain and SIMPRIM library of macrocells, are considered as a case study, which exposes the gap existing between the robustness assessment at both RTL and implementation-level.
Paper accepted at Computing journal
The paper entitled “Simulating the effects of logic faults in implementation-level VITAL-compliant models”, written by Ilya Tuzov, David de Andrés and Juan-Carlos Ruiz has been accepted at Computing journal (Springer). This paper is an extension of the paper entitled “Accurately simulating the effects of faults in VHDL models described at the implementation-level”, awarded as best paper in the EDCC 2017.
Abstract:
Simulation-based fault injection (SBFI) is a well-known technique to assess the dependability of hardware designs specified using Hardware Description Languages (HDL). Although logic faults are usually introduced in models defined at the Register Transfer Level (RTL), most accurate results can be obtained by considering implementation-level ones, which reflect the actual structure and timing of the circuit. These models consist of a list of interconnected technology-specific components (macrocells), provided by vendors and annotated with post-place-and-route delays. Macrocells described in the Very High Speed Integrated Circuit HDL (VHDL) should also comply with the VHDL Initiative Towards Application Specific Integrated Circuit Libraries (VITAL) standard to be interoperable across standard simulators. However, the rigid architecture imposed by VITAL makes that fault injection procedures applied at RTL cannot be used straightforwardly. This work identifies a set of generic operations on VITAL-compliant macrocells that are later used to define how to accurately simulate the effects of common logic fault models. The generality of this proposal is supported by the definition of a platform-specific fault procedure based on these operations. Three embedded processors, implemented using the Xilinx’s toolchain and SIMPRIM library of macrocells, are considered as a case study, which exposes the gap existing between the robustness assessment at both RTL and implementation-level.