Computer Technology Increasing Congestion on High Performance Computing Systems Similar to Seattle Traffic

Category Computer Science

tldr #

Increasing traffic congestion in Seattle is an analogy for a growing congestion in high-performance computer systems due to complex workloads such as AI models. Traditional HPC network topologies are optimized for physics simulations, whereas modern AI workloads require more unpredictable communications. Scientists at Pacific Northwest National Laboratory specialize in graph theory to find solutions to the HPC bottlenecks.

content #

Increasing traffic congestion in the Seattle area is a good analogy for a similar increase in congestion on high-performance computing (HPC) systems, according to scientists at Pacific Northwest National Laboratory (PNNL). More complex workloads, such as training artificial intelligence (AI) models, are to blame for the HPC bottlenecks, the scientists say in a paper published in The Next Wave, the National Security Agency's review of emerging technologies.

Advances in artificial intelligence are contributing to the growing congestion in HPC networks as AI model training requires significant data transfer between servers

"We can solve the congestion through how we create the network," said Sinan Aksoy, a senior data scientist and team leader at PNNL who specializes in the mathematical field of graph theory and complex networks.

In HPC systems, hundreds of individual computer servers, known as nodes, work as a single supercomputer. The arrangement of the nodes and links between them is the network topology.

HPC congestion occurs when the exchange of data between nodes funnels onto the same link, creating a bottleneck.

The arrangement of nodes and links between them is known as network topology which is the foundation for optimal HPC system operations

HPC system bottlenecks are more common today than they were when the systems were designed, as Aksoy and his colleagues Roberto Gioiosa, a computer scientist in the HPC group at PNNL, and Stephen Young, a mathematician in the math group at PNNL, explain in The Next Wave. That's because the way people use HPC systems today is different than the way they did when the systems were developed.

"This is an artifact of life changing," said Gioiosa. "We didn't have Facebook 20 years ago, we didn't have this big data, we didn't have big AI models, we didn't have ChatGPT." .

HPC system bottlenecks are caused when an overabundance of data is transferred over the same link, creating congestion

Big tech expands .

Starting in the 1990s, the computer technology industry began to blossom. New companies disrupted the Seattle area's economy and where people live and work. The resulting traffic patterns became less predictable, less structured, and more congested, especially along the east-west axis that constrains traffic to two bridges across Lake Washington.

Traditional HPC network topologies resemble the Seattle area road network, according to the researchers at PNNL. The topologies are optimized for physics simulations of things such as the interactions between molecules or regional climate systems, not modern AI workloads.

Big data and AI models were virtually nonexistent when HPC systems were first designed

In physics simulations, the calculations on one server inform the calculations on adjacent servers. As a result, network topologies optimize the exchange of data among neighboring servers.

For example, in a physics simulation of a regional climate system, one server might simulate the climate over Seattle and another the climate over the waters of the Puget Sound to the west of Seattle.

"The Puget Sound climate model is not going to affect what's going on in New York City–I mean, it is eventually–but really it needs to talk to the Seattle model, so I might as well hook the Puget Sound computer and the Seattle computer right next to each other," said Young, a mathematician in PNNL's computational math group.

Though the same principles apply, HPC network topologies are optimized for physics simulations, not modern AI workloads

The communication patterns in data analytics and AI applications are irregular and unpredictable. Calculations on one server may inform calculations on a computer across the room. Running those workloads on traditional HPC networks is akin to driving around the great Seattle area sprawl without being able to use I-5 or I-90.

hashtags #
worddensity #