HIDDRA : Highly Independent Data Distribution and Retrieval Architecture

Data distribution is a common problem in observation missions. Data gathered by the satellite has to be distributed among different operational centers using different policies: as fast as possible, creating redundant copies, informing interest scientifics about the new available products, etc. All these policies have to be accomplished taking into account features such as:

  • Performance: To transmit data does not require high computational resources. Small machines with simple architectures (therefore cheap machines) must be able to send information across the Internet using a distribution system.

  • Scalability: A data distribution system must deal with hundreds of clients. To add a new center to distribute content must not be a problem in terms of management or scalability. The system must be supposed to grow independently of the number of users.

  • Availability: A distribution system can not stop working, availability is essential. Information must be available at any moment it does not matter where it is or how to get it.

  • Complexity of management/installation: To install a high performance distribution system must be easy. Others distribution/retrieval systems such as GridFTP needs of other systems (e.g. Globus Toolkit) to be installed and working. A good system must be simple enough to be installed and working in a matter of minutes.

  • User satisfaction: In observation missions the final recipient of the information is a member of the scientific community. This person wants to work easily with the maximum amount of data. For this reason, it is important to provide automatic retrieval technologies based on subscriptions. Using this technology a person only needs to communicate his interests to receive automatically useful information. Others system requires the user to download manually files from FTP servers, being a problem when the amount of information is really high.

Aforementioned features are desired by common distribution systems, but in observation missions it is necessary to deal with other specific problems:

  • Heterogeneity: When involving different centers in the distribution, flexibility is a desired feature. Every center can be managed using different policies. For example an observation center uses FTP to distribute data because during a past mission the infrastructure was thought to be FTP. But three other center uses HTTP, and other one uses HTTPS. A distribution system has to be flexible enough to deal transparently with heterogeneous protocols. We can also find heterogeneity at the data level, therefore a generic system not linked with a concrete set of protocols or information is desirable.

  • Redundancy: Data is fundamental. Using redundancy we increase the availability but in addition, we make information reliable. What happen if we try to download information from a server momentarily not working. In a redundant system other severs would be used in a transparent way, informing about the fallen server.

  • Communities of users: Small scientific communities are the last recipients of the information. These users will probably be interested in the same files and will want the information to be available as soon as it appears. This means that sets of files have to be downloaded by several users at the same time with no problems.