The Computer Journal Advance Access published online on July 22, 2009
The Computer Journal, doi:10.1093/comjnl/bxp067
Multi-criteria Scheduling of Precedence Task Graphs on Heterogeneous Platforms
ENS Lyon, Université de Lyon, LIP Laboratory UMR 5668, ENS Lyon, CNRS, INRIA, UCBL, Lyon, France
* Corresponding author: Anne.Benoit{at}ens-lyon.fr
Received 13 January 2009; revised 20 April 2009
Latency, fault tolerance and reliability are important requirements for several applications that are time critical in nature: such applications require guarantees in terms of latency, even when processors are subject to failures. In this paper, we propose a fault-tolerant scheduling heuristic for mapping precedence task graphs on heterogeneous systems. Our approach is based on an active replication scheme, capable of supporting
arbitrary fail-silent/fail-stop processor failures, and hence valid results will be provided even if
processors fail. First we focus on a bi-criteria approach, where we aim at minimizing the latency given a fixed number of failures supported in the system, or the other way round. Next we derive a more complex algorithm in which we not only minimize latency and support a fixed number of failures, but also improve the overall reliability. Major achievements include low complexity of the new algorithms, and a drastic reduction of the number of additional communications induced by the replication mechanism. Experimental results demonstrate that our heuristics, despite their lower complexity, outperform their direct competitor, the fault-tolerance based active replication scheduling algorithm FTBAR.
Key Words: scheduling fault tolerance heterogeneous platforms task graphs