Methods and systems for the automatic real-time self-repair of embedded computing devices using Competitive Runtime Reconfiguration (CRR), enabling said devices with a superior level of fault tolerance
Reliable embedded computing devices have become an essential part of modern living, and play a crucial role in mission critical endeavors in which the safety of human life or costly material assets are at risk. Embedded systems control very specific tasks within a variety of electronic devices such as street lights, factory controllers and even the systems which control nuclear power plants. In order to recover from failure or malfunction, these embedded systems require a degree of fault tolerance built into them. In the past, fault tolerance was achieved by the use of spare parts. However, due to the need for reduced size and weight, part redundancy is no longer the most viable option. Adaptive regeneration has replaced the use of spare parts as the preferred implementation of fault tolerance. One such adaptive approach is Competitive Runtime Reconfiguration (CRR). During operation of the embedded devices, several logic functions compete for selection, based on their “fitness” for the task at hand. This selection then targets the most fault-free option in order to carry out its task. Unfortunately, there are still numerous limitations to the existing fault tolerant approaches. For example, many are capable of handling either permanent failures or transient faults, but not both. In addition, their method of implementation is complex and costly since the different phases of fault handling are treated separately. Another factor that contributes to the high cost of existing methods is that often devices need to be completely offline in order to be repaired.
UCF scientists have developed a novel approach to fault handling which is capable of dealing with both transient and permanent faults without the use of bulky spare parts. Their approach uses CRR and integrates the identification, diagnosis and recovery from flaws, making it possible to conduct repairs on devices that are still partially online.
- Provides a definitive means to enhance reliability in embedded systems, at a marginal cost
- Automatic self-repair of logic devices is done in-situ and in real time while the device remains partially online
- A single approach is used to address all of the multiple phases of fault handling (detection, isolation, diagnosis and recovery)
- Both transient and permanent faults are handled by the same process The technology can be integrated into future Nanodevices
- Embedded electronics
- Field-Programmable Gate Arrays (FPGA’s)
- Field Programmable Analog Arrays (FPAA’s)
- Field Programmable Transistor Arrays (FPTA’s)
- Cellular base stations
- Avionics systems
- Process control
- Instrumentation equipment