Home
Tech
'Nobody knows yet': Donut design could create trillion-transistor computing monster: Analysts discuss unusual interconnection as Cerebras CEO acknowledges we don't know what happens when multiple WSEs connect

'Nobody knows yet': Donut design could create trillion-transistor computing monster: Analysts discuss unusual interconnection as Cerebras CEO acknowledges we don't know what happens when multiple WSEs connect

Tri-Labs (composed of three major US research institutions: Lawrence Livermore National Laboratory (LLNL), Sandia National Laboratories (SNL), and Los Alamos National Laboratory (LANL)) has been working with the company of artificial intelligence Brains in a series of scientific problems. , including breaking the molecular dynamics (MD) time scale barrier.

There is a paper explaining this particular challenge, which you can read here, but essentially it concerns the problem of performing molecular dynamics simulations on a longer time scale than would normally be possible.

The barriers here are twofold: computational power and communication latency between different nodes of an HPC system. Traditionally, to compensate for the lack of computational power, scientists assign more work to each node and increase the simulation size with the node count. Unfortunately, the slow communication between nodes caused by high latency further exacerbates the time scale problem.

like a donut

MD simulations are crucial for several scientific fields as they bridge the gap between quantum electronics methods and continuum mechanics methods. However, these simulations encounter time scale limitations, as they must take into account atomic vibrations, which take place on very short time scales, and other phenomena that occur over much longer periods.

The authors of the paper attempted to overcome the time scale barrier by employing a more efficient computational system, specifically Cerebras' Wafer-Scale engine.

As The next platform explains: “The specific simulation consisted of transmitting radiation to three different crystal lattices made of tungsten, copper and tantalum. “In these particular simulations, which were for 801,792 atoms in each lattice, the idea is to bombard the lattices with radiation and see what happens.”

By running the simulations on Frontier, the world's fastest supercomputer based at Oak Ridge National Laboratory in Tennessee, and on Quartz at LLNL, scientists were only able to witness nanoseconds of what was happening in the networks as they were bombarded with radiation. Using WSE, they were given tens of milliseconds of time to observe what happened.

For testing, Tri-Labs used Cerebras Wafer Scale Engine 2 (WSE-2), rather than the newer, more powerful WSE-3 released earlier this year, but as detailed above, the results were impressive. As the paper reports, “By dedicating one processor core to each simulated atom, we demonstrate a 179x improvement in time steps per second compared to Frontier's GPU-based Exascale platform, along with a large improvement in steps per second. of time per unit of energy. “Reducing each year of lead time to two days unlocks currently inaccessible time scales of slow microstructure transformation processes that are critical to understanding material behavior and function.”

The next platformTimothy Prickett Morgan asked Cerebras CEO and co-founder Andrew Feldman what happens when you connect multiple wafer-scale motors and try to run the same simulation and was told “no one knows yet.”

Prickett Morgan went on to note: “The proprietary interconnect in the WSE-2 systems could scale to 192 devices, and with the WSE-3, that number increased by more than an order of magnitude to 2,048 devices,” but he “strongly suspects” “The same scaling principles apply to WSEs as they do to GPUs and CPUs.”

However, he went on to suggest that there might be some way to physically join the WSEs together and make a “kitchen of squares of interconnected WSEs,” potentially creating a donut design with power running on the inside and cooling on the outside. As Prickett Morgan concludes: “This type of setup couldn't be worse than using InfiniBand or Ethernet to interconnect CPUs or GPUs.”

You may also like...