Friday, November 10, 2023

The first (and fastest) exascale supercomputer


© Mark Ollig


The Department of Energy’s Oak Ridge National Lab (ORNL) in Oak Ridge, TN, operates the Hewlett Packard Enterprise (HPE) Frontier exascale supercomputer or OLCF-5.

The OLCF-5 (Oak Ridge Leadership Computing Facility) supercomputer is the fifth in a series developed by OLCF; thus, the five refers to its generation number.

The $600 million supercomputer was developed for ORNL, manufactured by HPE, and built with the collaboration of ORNL, Cray Inc. (a subsidiary of HPE), and Advanced Micro Devices, Inc., aka AMD.

The OLCF-5 is located at the Oak Ridge Leadership Computing Facility in Oak Ridge, TN, and is sponsored by the United States Department of Energy.

In March, the OLCF-5 supercomputer achieved a groundbreaking feat by becoming the fastest computer on the planet.

It reached 1.102 quintillion floating-point operations per second in processing power, known as exaflop, crossing over into the exascale processing range.

This record was measured using Rmax, the standard benchmark for evaluating supercomputer performance.

Exascale computing is a type of supercomputing that can perform at least one exaflops (10^18) or one quintillion calculations per second.

The number one quintillion has 18 zeros and is also known as one million trillion.

A stack of one quintillion pennies would weigh about 8.8 trillion pounds and be approximately 11,826,923 miles high.

Indeed, one quintillion is quite a number.

But I digress.

Exascale computing has the potential to revolutionize scientific research by enabling unprecedented accuracy and precision for tackling complex challenges, empowering researchers and scientists to answer previously unsolvable problems in fields including climate science, materials science, energy research, celestial research, and artificial intelligence.

The OLCF-5 occupies an area of 372,4,004 square feet and houses 74 computer cabinets, some of which weigh as much as 8,000 pounds.

These cabinets are located in a climate-controlled environment to maintain stable temperature and humidity levels.

The OLCF-5 supercomputer features 9,472 AMD Epyc 7453s “Trento” central processing units (CPU) with 64 cores (processing unit) operating at 2 GHz (totaling 606,208 cores).

It has 37,888 Radeon Instinct MI250X graphics processing units (GPU), using an impressive 8,335,360 cores.

Each computing node of the OLCF-5 supercomputer is fitted with a 64-core AMD Trento CPU, 512 gigabytes of Double Data Rate, four Synchronous Dynamic Random-Access Memory (DDR4 SDRAM), and four AMD Radeon Instinct GPUs.

The supercomputing system is built on a seven-nanometer production node process using 16.6 billion transistors.

From a nearby electrical substation, a new 2.5-mile-long dedicated power line is needed to be installed in the room housing the supercomputer.

The OLCF-5 consumes 21 megawatts of power and has a peak power consumption of 40 megawatts.

The supercomputer has a storage system that can read data at 75 terabytes per second and write data at 35 terabytes per second and a flash storage system that can process 15 billion input/output operations per second.

In addition, the OLCF-5 supercomputer has a large file system called the Orion Lustre files ystem that can store up to 700 petabytes (PB) of data.

A petabyte of data equals approximately 1,000 terabytes or 1,000,000 gigabytes of data storage.

The actual number of bytes in 1 PB is 1,125,899,906,842,624.

Putting it into perspective, it would take 486 billion 1.44 MB 3.5-inch floppy disks to hold 700 petabytes of data, which is equivalent to storing 500 billion pages of standard typed text.

A popular 1980s storage method would require a mind-boggling 1.94 quintillion 5.25-inch double-sided 360 KB floppy disks to store 700 petabytes.

It would require approximately 652,421 one-terabyte hard drives to store 700 petabytes of data.

But I digress.

The OLCF-5 supercomputer plays a pivotal role in scientific research, improving efficiency and transforming research and data analysis.

With the help of supercomputers, healthcare medical researchers can analyze vast amounts of data, enabling them to quickly identify patterns and diagnose conditions, develop more effective treatments, and advance medical research in ways that were previously unimaginable.

The OLCF-5’s exascale processing power can shed light on the underlying causes of diseases, paving the way for future personalized medicine and medical solutions.

Supercomputers, with their colossal processing power, will become valuable contributors to improving artificial intelligence.

Scientists and researchers worldwide remotely access OLCF-5 through the self-service portal of the Oak Ridge Leadership Computing Facility and the Department of Energy’s high-speed computer network ESnet (Energy Sciences Network).

Cerebras Systems Inc., an artificial intelligence company, recently claimed that the Condor Galaxy-1 supercomputer, owned by G42, a technology holding company based in Abu Dhabi, UAE, has achieved a processing speed of 4 exaflops.

Currently, no publicly available Rmax benchmark test results can independently verify and support this claim.

The HPE Frontier exascale supercomputer OLCF-5 is today the first and fastest exascale supercomputer in the world, with a theoretical peak performance capacity of 1.5 exaflops.

Credit: Carlos Jones/ORNL, U.S. Dept. of Energy




Credit: Carlos Jones/ORNL, U.S. Dept. of Energy