VECPAR'06 - Seventh International Meeting on High Performance Computing for Computational Science |
Using Failure Injection Mechanisms to Experiment and Evaluate a Grid Failure Detector
Sébastien Monnet (Irisa / University of Rennes I)
Marin Bertier (Irisa / INSA of Rennes)
Computing grids are large-scale, highly-distributed, often hierarchical, platforms. At such scales, failures are no longer exceptions, but part of the normal behavior. When designing software for grids, developers have to take failures into account. It is crucial to make experiments at a large scale, with various volatility conditions, in order to measure the impact of failures on the whole system. This paper presents an experimental tool allowing the user to inject failures during a practical evaluation of fault-tolerant systems. We illustrate the usefulness of our tool through an evaluation of a hierarchical grid failure detector.
User experience in deploying grids and testbeds , Grid and cluster management , Performance evaluation
Logos Universidade Federal do Rio de Janeiro - Coordenação dos Programas de Pós-graduação de Engenharia Instituto Nacional de Matemática Pura e Aplicada Rio de Janeiro | Brazil | 2006 | July | 10 11 12 13