© 2004 by British Computer Society
| ||||||||||||||||||||||||||||||||||||||||||||||||||
Coding Techniques for Fault-Tolerant Parallel Prefix Computations in Abelian Groups
1 Department of Electrical and Computer Engineering, and Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, 148 CSL, 1308 West Main Street, Urbana, IL 61801-2307, USA
This paper presents coding techniques that can be used to provide fault tolerance to a parallel prefix computation that is performed on a binary tree of processing nodes. More specifically, we discuss how a parallel prefix computation in an arbitrary Abelian group can be protected using group homomorphisms. The proposed approach is general enough to handle a variety of group operations of interest and allows for designs ranging from simple parity schemes to full replication. Error detecting and correcting mechanisms are used solely at the leaf nodes and can capture faults at any node or link within the binary tree architecture on which the parallel prefix computation is performed. Furthermore, by tracking the propagation of errors in the binary tree, our method can identify a processing node that has permanently failed based on information from simple error detecting mechanisms at the leaf nodes.
Received 15 February 2003. Revised 26 November 2003.