Técnicas de tolerancia a fallos en sistemas de ficheros paralelos para clusters

1 June, 2016

Abstract

This work introduces a fault-tolerance model for the files of a parallel file system.
The main contributions of this PhD Thesis are the following:
A fault-tolerance model for parallel file systems that allows employ different fault-tolerance mechanisms at file level.
A model based on distribution patterns that offers a flexible and simple fault tolerance model description.
An analysis of the main properties of the distribution schemes resulted from the associated distribution patterns as defined in the proposed fault-tolerance model.
The algorithms needed to add, remove or modify the file-based fault-tolerance model in a dynamic way.
The introduction of distribution schemes based on external redundancy. Those schemes allow the dynamic addition and removal of fault-tolerance support to a file.
A POSIX extension to add, remove, modify and define the distribution schemes for files. The same functionality is also provided for MPI-IO through hints.
An evaluation of the proposed model has been made. For this evaluation, the model has been implemented by using the Expand parallel file system.
This evaluation shows that, even with the natural overhead introduced by the fault-tolerant files, this overhead is low and offer parallel file system users a simple and practical solution.

Project
https://arcos.inf.uc3m.es/~xpn

BibTex
@phdthesis{54,
author={Alejandro Calderón Mateos},
title={Técnicas de tolerancia a fallos en sistemas de ficheros paralelos para clusters},
year={2005},
school={Universidad Carlos III de Madrid}
}