日本語

 

Update(MM/DD/YYYY):12/05/2003

Grid Middleware "Gfarm" for Integrating Storages Distributed Worldwide to be Released to Public

- Awarded with “Distributed Infrastructure Prize” at the Bandwidth Challenge in the International Conference SC2003. -

Key Points

  • An open source Grid middleware “Gfarm” which integrates storages installed in clusters distributed across the world and implements massive data processing will be released to public on the web from today on.
  • The “Gfarm” ensures resource sharing in a safe and simple manner among multiple PC clusters using Grid Security Infrastructure (GSI), and realizes high reliability and efficient data processing based on replica managing technology.
  • A fast shared file system of capacity as large as 70 terabytes (TB) was built up in a Grid environment connected with 6 organizations in Japan and the United States, involving 236 PCs.
  • The study was awarded with the “Distributed Infrastructure Prize” at the Bandwidth Challenge of the SC2003, in recognition of high performance and exquisite reliability.


Synopsis

The Grid Technology Research Center (GTRC) of the National Institute of Advanced Industrial Science and Technology (AIST), one of independent administrative institutions, has been engaged in collaboration with other research organizations in R&D of Grid Datafarm, which is one of Grid computing techniques for analyzing super massive data through a large-scale coordination of works at multiple organizations. The Grid middleware “Gfarm” to implement the Grid Datafarm concept was successfully demonstrated at an international conference SC2003.

The “Gfarm version 1.0” has been officially released to public on the web free of charge from November 25, 2003.

The “Gfarm” is an open source software to combine a number of storages distributed worldwide and linked in a network as if handling a single storage. As a single file system is formed in aggregate, any user can process an enormous amount of data without worrying about where data-storing resources are actually located. Resources under different management systems can be safely shared with a single sign-on, owing to the Grid Security Infrastructure.

With the “Gfarm”, massive data can be processed using resources distributed over the world, achieving enhanced data processing capability based on the local nature of data access. Accordingly, the larger the number of processors, the higher the performance becomes. Replicas of a single data are placed at multiple sites allowing users automatically utilize data existing near at hand without being aware of. Even when a part of system should fail or the network service should be interrupted, high reliability would be ensured by making reference to any available replica.

In the present experiment, a fast massive shared file system of disk capacity 70 TB (1 TB = 1012 bytes, 70 TB = 15,000 DVD disks) was realized on PC clusters consisting of 236 PCs distributed over 6 organizations in Japan and the US, based on the Gfarm at the International Conference SC2003 held from November 15 to 21, 2003 in Phoenix, Arizona, US.

A demonstration for analyzing 1.8 TB data on the fast massive shared file system was successful in verifying high performance. In recognition of excellent reliability and performance, the study was awarded with the “Distributed Infrastructure Prize” at the Bandwidth Challenge in the SC2003. The success was accomplished on a Grid environment built up on the basis of collaboration among 6 organizations in Japan and the US: AIST, High Energy Accelerator Research Organization (KEK), Tokyo Institute of Technology (Titech), University of Tsukuba (U.Tsukuba), Asia-Pacific Advanced Network, Tokyo XP (APAN Tokyo XP), and Indiana University (IndU), under the network support from Tsukuba Wide Area Network (Tsukuba WAN), Asia-Pacific Advanced Network (APAN), Super Science Information Network (SuperSINET), and Ministry of Agriculture, Forestry and Fisheries Information Network (MAFFIN).


fig.1
Fig. 1. Grid environment created in this experiment
( NII = National Institute of Informatics, Titech = Tokyo Institute of Technology, U.Tsukuba= University of Tsukuba, KEK=High Energy Accelerator Research Organization)


Details of Experiment and Grid Environment

The demonstration system for Gfarm was constructed on a Grid environment made in collaboration among 6 organizations in Japan and the United States. In the experiment, a massive scientific data, consisting of astronomic observation data and quantum chromodynamics simulation, was stored on a PC cluster in the conference hall of SC2003, and processed using multiple PC clusters. A PC cluster at each organization replicated the massive scientific data, and carried out data analysis in parallel on multiple PC clusters using one of the replicated data. In this experiment, the massive data of 1.8 TB in total were processed by using multiple PC clusters.

In the present Data Grid experiment, following high-speed wide area networks were used: Tsukuba WAN and SuperSINET within Japan, APAN / TransPAC and SuperSINET between Japan and the US, and Abilence in the US. For the connection within the conference hall, the SCnet was used.

In order to achieve high transfer capability for the actual application, not only using an advanced network, but also a high speed file system to feed data in necessary transfer rate are required. In the Gfarm, data distributed on the network are accessed in parallel by remote systems holding pertinent data. In the present experimental environment, the file access rate is as fast as 13 GB/s, which is equivalent to reading / writing a DVD disk in 0.36 s. This ensures high network transfer capability.

In the present experiment, R&D works were assigned to different research institutions in the following way: AIST for the software development for Grid Datafarm and the coordination of environment for the demonstration experiment, U.Tsukuba for the preparation of quantum chromodynamics simulation data, and other partners joining discussions on the demonstration, providing computer, network and disk resources, as well as building up the environment.





▲ ページトップへ