Sistema de ficheros paralelo escalable para entornos cluster

1 June, 2016

Abstract

Nowadays, the applications used in environments high performance computing, such as simulations scientific applications dedicated to data extraction (data-mining), manage large amounts of information, needing huge computing and memory resources.

Cluster architecture is the most common solution for HPC applications. There are two kinds of cluster architectures: first, based on the aggregation of heterogeneous components and others, built with homogeneous components of large-supercomputers. Heterogeneous cluster architectures have a main problem, because it is built using different hardware and software technologies. There are no parallel file systems to adapt all of these diverse technologies available on these architectures. Moreover, homogeneous large-clusters have an I/O imbalance problem. This is due to the large number of compute nodes available compared to the few number of I/O nodes. This imbalance converts the I/O system on a bottleneck for HPC applications.

The most common approach to remove the heterogeneity of the clusters is the adaptation of the nodes integrating technology to allow compatibility with new systems. Moreover, in the case of large clusters, traditional solutions are the use of parallel file systems and include changes in the infrastructure of the storage system, such as increasing the number of I/O nodes. In both cases, the solutions have high economic and time costs in the adaptation and configuration of the I/O infrastructure.

This thesis proposes a solution for the problems presented above. The goals are the following:
Providing uniform data access using standard I/O technologies with the purpose of constructing storage systems in heterogeneous environments.
Balancing effective I/O load and eliminating the overhead of storage systems in large scale environments.

To achieve these objectives we designed the following solutions:
A parallel file system platform based on the use of standard technologies for the formation of storage systems for heterogeneous clusters, providing further homogenice platform data access to applications.
An I/O architecture based on the extension of the diagrams of the hierarchy of memory to the large clusters environment, increasing the number of I/O nodes of the clusters to improve the parallelism and to reduce the I/O access to the storage.

This document details the proposed solutions and shows the evaluations of them.

Project

BibTex
@phdthesis{63,
author={Luis Miguel Sánchez García},
title={Sistema de ficheros paralelo escalable para entornos cluster},
year={2009},
school={Universidad Carlos III de Madrid}
}