© 1995 by British Computer Society
Analysis of probabilistic error checking procedures on storage systems

1 Institute of Information Engineering, National Cheng Kung University, University Road, Tainan, Taiwan Email: irchen{at}iie.ncku.edu.tw, 2 Department of Computer Science, Michigan State University, East Lansing, MI 48824-1027, USA
Conventionally, error checking on storage systems is performed on-the-fly (with probability 1) as the storage system is being accessed in order to improve the reliability of the storage system. However, such a procedure may needlessly cause degraded performance due to the extra processing time needed for executing the error checking code. In this paper, we consider fault-tolerance storage systems designed to provide continuous services to customers over a mission period and the design goal is (1) to maximize the cumulative number of requests that the storage system can service without failure over the mission period or (2) to be able to service at least a given number of requests without failure over the mission period with its system reliability maximized. We develop a Markov reward model to identify the design conditions under which probabilistic error checking procedures can better satisfy this design goal than conventional on-to-fly error checking procedures. The result helps determine the best time interval between successive executions of the error checking procedure to meet such design goal and is useful for designing adaptive systems that can temporarily tradeoff reliability for performance to meet design goal (2) in response to dynamic workload changes in the environment.
* Institute of Information Engineering, National Cheng Kung University, University Road, Tainan, Taiwan Email: irchen{at}iie.ncku.edu.tw
Department of Computer Science, Michigan State University, East Lansing, MI 48824-1027, USA