High Performance Data Access in Large-Scale Distributed Systems

1 June, 2016


A great number of scientific projects need supercomputing resources, such as, for example, those carried out in physics, chemistry, pharmacology, etc. Most of them generate, as well, a great amount of data; for example, a some minutes long experiment in a particle accelerator generates several terabytes of data.

In the last years high-performance computing environments have evolved towards large-scale distributed systems such as Grids, Clouds, and volunteer computing environments. Managing a great volume of data in these environments means an added huge problem since the data have to travel from one site to another through the internet.

In this work a novel generic I/O architecture for large-scale distributed systems used for high-performance and high-throughput computing will be proposed. This solution is based on applying paralell I/O techniques to remote data access. Novel replication and data search schemes will also be proposed; schemes that, combined with the above techniques, will allow to improve the performance of those applications that execute in these environments. In addition, it will be proposed to develop simulation tools that allow to test these and other ideas without needing to use real platforms due to their technical and logistic limitations. An initial prototype of this solution has been evaluated and the results show a noteworthy improvement regarding to data access compared to existing solutions.


author={Borja Bergua Guerra},
title={High Performance Data Access in Large-Scale Distributed Systems},
school={Universidad Carlos III de Madrid}