© 2001 by British Computer Society
Evaluation of Fault-Tolerant Multiprocessor Systems for High Assurance Applications
1 CNUCE/CNR, Via V. Alfieri 1, 56010 Ghezzano, Pisa, Italy Email: grandoni@iei.pi.cnr.it 2 IEI/CNR, Via V. Alfieri 1, 56010 Ghezzano, Pisa, Italy 3 Dip. Sistemi e Informatica, Università di Firenze, Via Lombroso 6/17, 50134 Firenze, Italy
In designing high assurance systems, the dependability goals are achieved through the adoption of several fault-tolerance techniques. Unfortunately, their combined effect on the system cannot be, in the general case, derived by straightforward composition of the stand-alone component's analysis, because of mutual dependence of their controlling parameters. In this paper the assessment of overall system dependability induced by such integrated fault-tolerance organization is carried out through a stochastic simulation approach. To this purpose, a few fault-tolerant multiprocessor architectures, based on the integrated usage of standard error-processing structures with a recently-proposed diagnostic mechanism, called $\alpha$-count, are selected and evaluated. The diagnostic mechanism gets its input (error signals) from the error-processing mechanism, whose behaviour is in turn influenced by the rapidity and correctness with which $\alpha$-count identifies permanently/intermittently faulty processors. The choice of the basic fault-tolerance mechanisms to adopt, as well as the reference-system architecture, has been driven by the characteristics of the envisaged target applications: mainly, stringent dependability requirements, to be traded with adequate levels of performance and cost. The analysis has focused on performability, which is an appropriate measure to evaluate whether a certain design is better than another under dependability and performance point of view.
Received 3 November, 2000. Revised 30 April, 2001.