Posted: January 9, 2008

United we compute: FermiGrid continues to yield results

(Nanowerk News) Before FermiGrid, the computing resources at high energy physics laboratory Fermilab, in Illinois, U.S., were individually packaged for the dedicated use of particular experiments.
By late 2004, all this began to change. The birth of FermiGrid, an initiative aiming to unite all of Fermilab’s computing resources into a single grid infrastructure, changed the way that computing was done at the lab, improving efficiency and making better use of these resources along the way.
Faster physics from FermiGrid
FermiGrid was initially deployed for small Fermilab experiments and the Compact Muon Solenoid experiment; other large experiments—D-Zero and the Collider Detector at Fermilab—soon followed suit.
The ability to access FermiGrid resources has already paid off. As one example, researchers working on the D-Zero experiment recently decided to reprocess vast amounts of data in response to a detector upgrade. Thanks to FermiGrid, they were able to process about 900 million events between February and May of 2007, around 233 million of which used FermiGrid resources ostensibly belonging to CMS.
FermiGrid also plays a major role in the Collider Detector at Fermilab, which primarily uses grid resources to generate Monte Carlo data.
The Feynmann computer center
The Feynmann computer center, foreground, houses the Fermilab WLCG Tier-1 center as well as computers contributing to Open Science Grid and FermiGrid. (Image: Reidar Hahn)
Community; laboratory; community
FermiGrid’s contributions continue to grow, reaching beyond physics and Fermilab to play a part in projects including the Laser Interferometer Gravitational Wave Observatory and the nanohub computational nanotechnology group.
“FermiGrid operates as a universal donor of opportunistic resources to Open Science Grid,” says Keith Chadwick, associate head of FermiGrid facilities. “Since January 2007, we have given away roughly ten thousand CPU-hours per month to OSG’s virtual organizations.”
Eleven virtual organizations are hosted at Fermilab and take advantage of FermiGrid resources. In addition, FermiGrid also contributes to the Worldwide LHC Computing Grid, which will be used for the upcoming operations of the Large Hadron Collider.
Of course, providing such a facility does not come without challenges. The FermiGrid team must react to changing demands without advanced warning and must keep the core services up and running with minimal outages. Effort per month averages two-and-a-half full-time employees, but this is increasing.
So what’s next for FermiGrid?
Although some system redundancy is built-in, the FermiGrid team is now working to configure the core grid services such that they can be hosted on two independent platforms, either of which could carry the load if one system went down.
“FermiGrid is well-positioned to continue to provide a high performance, robust and highly available grid facility to the Fermilab scientific community and the broader OSG consortium,” says Eileen Berman, head of FermiGrid facilities.
Six production analysis clusters are running jobs to and from the FermiGrid gateway. As of September 2007, 2081 worker nodes (~9000 job slots) are available. An additional 400 worker nodes (~1800 job slots) will soon be deployed.
Source: ISGTW (Marcia Teckenbrock)