High Performance Computing on Large-Scale Platforms Research

  • Autotuning the performance of the software I/O stack.
  • Design, implement and evaluate a decentralized file cache hierarchy for large-scale architectures.
  • Multiple-level data staging for large scale HPC architectures: write-back and prefetching.
  • Study decentralized I/O coordination strategies for improving resource utilization in large-scale HPC architectures.
  • Leverage data locality for optimal communication in collective I/O.
  • View-based non-contiguous I/O optimizations.
  • Inspector-executor collective I/O.

Publications

Parallel File Systems Research

  • Generic data layouts for parallel files based on multi-dimensional data distribution algorithms from parallelizing compilers.
  • File system-level generalized file views.
  • File-system-level collective I/O (two-phase I/O and server-directed I/O).
  • File-system-level integration and study of cooperative caching and collective I/O.
  • Elastic distributed file system partitions.
  • Fault-tolerant models for parallel file systems.
  • Study decentralized parallel I/O scheduling in parallel file systems.
  • Novel MPI-IO implementation for Clusterfile and GPFS.

Publications

Machine Learning Research

  • Auto-tuning of parallel I/O access performance based on machine learning models for performance prediction
  • Leverage machine learning prediction models in the design of large-scale distributed data management.
  • Time-series analysis: apply autoregressive and exponentially smoothing models for on-line history-based workload prediction.
  • Social network analysis: employ structural and dynamic network analysis for understanding data locality, data sharing and data popularity patterns.
  • Dynamic clustering algorithms for improving content locality.

Publications

Cloud Computing Research

  • Predictive data grouping algorithms for optimizing content locality and server load balance.
  • Elastic data placement algorithms for optimizing server utilization.
  • Multi-model algorithms for on-line workload scalability.

Publications

Peer- to-Peer Systems Research

  • Leverage collaborative classifications and multiple clustering for improving content locality.
  • Node-level and cluster-level self-organization for adapting to locality dynamics.
  • Efficient parallel lookup algorithm for high data recall, high tolerance to node failure, and avoidance of redundant communication.

Publications