NRI designs third Fastest Supercomputer

Sunday, November 9 2003

Houston: Virginia Tech's scientists led by a 30 year old Indian American assistant professor of computer science, Dr Srinidhi Varadarajan have amazed the computing industry by putting together the world's third fastest supercomputer in a record time of three months, and at record low cost of $ 5.2 million, using off-the-shelf components.

Most other machines of its class cost upward of $ 40 million and take years to assemble. Japan's Earth Simulator, the number one supercomputer, is said to have cost at least $ 350 million

Dr Srinidhi Varadarajan

. The Terascale Cluster project is bringing Virginia Tech to the forefront in the supercomputing arena. A supercomputer made from 1,100 dual-processor Power Mac G5s, nicknamed by some as "Big Mac" ranks third among the world's 500 fastest supercomputers, many of which handle with ease one trillion calculations per second.

The Top 500 is twice a year listing started in 1993 to provide a "Who's Who" of hot computers, spotting and tracking trends in high-performance computing. The ranking by the Top 500 project will be officially announced later this month at the Supercomputing Conference in Phoenix.

According to Dr Varadarajan, "this is arguably the cheapest supercomputer and is definitely the most powerful home-built supercomputer." Theoretically, Big Mac could handle a potential 17 teraflops, or 17 trillion operations per second. That still falls short of the No. 1 machine, Japan's Earth Simulator, whose 5,000-plus processors keep it on top with 35.8 teraflops, with the potential of another five teraflops.

PTI


NRI Scientists at the Virginia Polytechnic Institute, have created a groundbreaking supercomputer cluster, one of the fastest machines in the world in a "virtual flash"

You can build a very high performance machine for a fifth to a tenth of the cost of what supercomputers now cost. Virginia Tech's idea was to develop a supercomputer of national prominence based upon a homegrown cluster

This low-cost computer is shaking up the escoteric world of high performance computing where the fastest machines have traditionally cost from $100 million to $250 million and taken several years to build

Apple-based supercomputer, which is powered by 2,200 IBM microprocessors, was able to compute at 7.41 trillion operations a second, a speed surpassed by only three other ultra-fast computers.

The fastest computers on the current Top 500 list are the Japanese Earth Simulator; a Los Alamos National Laboratory machine dedicated to weapons design; and another weapons oriented cluster of Intel Pentium 4 microprocessors at the Lawrence Livermore National Laboratories.

The Japanese computer was measured at 35.8 trillion operations a second last year but American computer experts estimate that it cost as much as $250 million. By contrast, the fastest cluster machine, the Lawrence Livermore system consisting of 2304 Intel Xeon processors, is capable of 7.63 trillion operations a second, at a price estimated at $10 million to $15 million.


Varadarajan Leads VT Project to create Groundbreaking Supercomputer Cluster

Srinidhi Varadarajan, assistant professor of computer science and the primary architect of the facility Virginia Tech, teaming with Apple Computer, Cisco, Liebert, and Mellanox Technologies, is creating a world-class supercomputing cluster. We are ambitiously designing a large 64-bit InfiniBand cluster using existing, off-the-shelf industry components. The supercomputer will belong to the university, significantly enhancing our research capabilities.

Srinidhi Varadarajan, assistant professor of computer science, has been the primary architect of the facility. CS department head Dennis Kafura and associate head Cal Ribbens also have been heavily involved with the development of this project.

“Virginia Tech’s idea was to develop a supercomputer of national prominence based upon a homegrown cluster,” says Hassan Aref, dean of the College of Engineering and a former chief scientist at the San Diego Supercomputer Center.

Aref and Earving Blythe, vice president for information technology, have provided primary leadership and support for the effort.

“This terascale computer will be one of the most powerful supercomputers in the world. It will give CS students and faculty an incredible opportunity to carry out research at the cutting edge of high performance computing systems and applications,” says Ribbens.

The new facility will be located at the university’s Computing Center. Plans call for a future installation to be housed in a building dedicated to the Institute for Critical Technology and Applied Science (ICTAS) at Virginia Tech. ICTAS is a new venture of the university that allows organized research units to cluster together on synergistic research.

The Virginia Tech team of engineers, computer scientists, and officials selected Apple’s new Power Mac, the G5, as the framework for the cluster. For months, the university worked with Apple to purchase and adapt the new machines, the world’s fastest personal computers, as they rolled off the manufacturing line in August.

As they waited for the machines, the team identified Mellanox, the leading provider of the InfiniBand semiconductor technology, to supply the primary communications fabric, drivers, cards, and switches for the project. The university asked Cisco Systems to join the enterprising effort. Cisco’s Gigabit Ethernet switches were the choice for the secondary communications fabric to interconnect the cluster. Cisco provided a significant educational discount to support the project.

The supercomputer needed a cooling system, and its designers worked with Liebert, a division of Emerson Network Power, known for its comprehensive range of protection systems for sensitive electronics. Based on the heat load for the system, normal air conditioning units were insufficient. Liebert was able to provide its new high-density rack mounted cooling system within the budget and time constraints of the project. They also custom designed computer racks along with power distribution equipment.

Weekly conference calls between the various players were organized in order to build the supercomputer at a record pace. Geographically, the operation was international in scope, with experts as far away as Israel and Japan taking part in the project.

This collaborative effort represents a “groundbreaking project,” Aref says. The people working this project “pulled off miracles, raising glass ceilings and opening locked doors.”

Varadarajan and Jason Lockhart, director of the College of Engineering’s High Performance Computing and Technology Innovation, initiated the venture at Virginia Tech. Varadarajan is an expert in reliability, a key issue in successfully exploiting terascale computing.

Component failures are endemic to any large-scale computational resource. While previous generations of supercomputers engineered reliability into systems hardware, today’s high performance computing environments are based on inexpensive clusters of commodity components, with no systemic solution for the reliability of total machine.

Virginia Tech has the first comprehensive solution to the problem of transparent fault tolerance, which enables large-scale supercomputers to mask hardware, operating system and software failures – a decades old problem. It’s a software program called Déjà vu, designed by Varadarajan. He also integrated the software with Apple’s G5s. This work will enable the terascale computing facility to operate as the first reliable supercomputing facility, according to Varadarajan, a National Science Foundation Faculty Early Career Development Program (CAREER) Award recipient.

Virginia Tech researchers are already active in a number of areas that will benefit from the new supercomputing facilities, says Kevin Shinpaugh, director of Research and Cluster Computing for the university. These include: nanoscale electronics, quantum chemistry, computational chemistry, aerodynamics through multidisciplinary design optimization, molecular statics, computational acoustics, and the molecular modeling of proteins.

Terascale computing is motivated by the needs of problems too large to be solved by any individual computer. The majority of these problems arise in the context of computational science. Until recently, progress in science and engineering has relied on a combination of theory and experiment. In recent decades, however, a third paradigm has emerged, namely computational science. The idea of computational science is to use computers to simulate the behavior of natural or human-engineered systems, rather than to observe the system or build a physical model of it.

“Virginia Tech will have one of the top ranked supercomputing facilities in the world, supporting significant “big science” research. It is anticipated that Virginia Tech will realize at least a five to one return on this investment in terms of annual research grant and contract activity,” says Glenda Scales, assistant dean of computing and distance learning at Virginia Tech.

To help keep the ambitious job on schedule “we used an assembly line of volunteer students to unpack computers and perform many of the routine but time consuming functions.” Patricia Arvin, associate vice president of information systems and computing, also credits the many disparate parts of the university, from electrical services to purchasing to facilities planners, for the success of this project. Dozens of computer science majors have been part of the assembly team.

"Mellanox embraces Virginia Tech's decision to deploy one of the top supercomputers in the world based completely on off-the-shelf industry standard components," said Eyal Waldman, CEO of Mellanox Technologies. "As evidenced by Virginia Tech’s cluster, the combination of industry standard servers, Linux and InfiniBand creates a new standard in clustering and is changing the way compute power is deployed."