Skip to main content

'No one knows yet': Donut design could create quadrillion-transitor compute monster — analysts discuss unusual interconnection as Cerebras CEO acknowledges that we don't know what happens when multiple WSEs are connected

Web Hosting & Remote IT Support

Tri-Labs (comprised of three major US research institutions - the Lawrence Livermore National Laboratory (LLNL), Sandia National Laboratories (SNL), and Los Alamos National Laboratory (LANL)) has been working with AI firm Cerebras on a number of scientific problems, including breaking the molecular dynamics (MD) timescale barrier.

There’s a paper explaining this particular challenge, which you can read here, but essentially it refers to the problem of conducting molecular dynamics simulations on a larger timescale than would normally be possible.

The barriers here are twofold: computational power and communication latency between different nodes of an HPC system. Traditionally, to compensate for the lack of computational power, scientists assign more work to each node and scale up the simulation size with the node count. Unfortunately, the slow inter-node communication caused by high latency further exacerbates the timescale problem.

Like a donut

MD simulations are crucial to several scientific fields as they bridge the gap between quantum electronic methods and continuum mechanics methods. However, these simulations encounter timescale limitations, as they have to account for atomic vibrations, which take place over very short timescales, and other phenomena that occur over much longer periods.

The authors of the paper sought to overcome the timescale barrier by employing a more efficient computational system, specifically Cerebras' Wafer-Scale Engine.

As The Next Platform explains, “The specific simulation was to beam radiation into three different crystal lattices made of tungsten, copper, and tantalum. In these particular simulations, which were for 801,792 atoms in each lattice, the idea is to bombard the lattices with radiation and see what happens.”

Running the simulations on Frontier, the world’s fastest supercomputer based at the Oak Ridge National Laboratory in Tennessee, and on Quartz at LLNL, scientists were only able to witness nanoseconds of what was happening to the lattices as they were bombarded with radiation. Using WSE, they were given tens of milliseconds of time to watch what happened.

For the tests, Tri-Labs used Cerebras Wafer Scale Engine 2 (WSE-2), rather than the newer, and more powerful WSE-3 launched earlier this year, but as detailed above the results were impressive. As the paper reports, “By dedicating a processor core for each simulated atom, we demonstrate a 179-fold improvement in timesteps per second versus the Frontier GPU-based Exascale platform, along with a large improvement in timesteps per unit energy. Reducing every year of runtime to two days unlocks currently inaccessible timescales of slow microstructure transformation processes that are critical for understanding material behavior and function.”

The Next Platform’s Timothy Prickett Morgan asked Cerebras CEO and co-founder, Andrew Feldman, what happens when you connect multiple wafer scale engines together and try to run the same simulation and was told “no one knows yet”.

Prickett Morgan went on to note, “The proprietary interconnect in the WSE-2 systems could scale to 192 devices, and with the WSE-3, that number was boosted by more than an order of magnitude to 2,048 devices,” but he “strongly suspects that the same scaling principles apply to WSEs as apply to GPUs and CPUs.”

He went onto suggest, however, that there could be some way to lash WSEs together physically, and make a “stovepipe of squares of interconnected WSEs,” potentially creating a donut design with power running on the inside and cooling on the outside. As Prickett Morgan concludes, “This kind of configuration could not be worse than using InfiniBand or Ethernet to interlink CPUs or GPUs.”

More from TechRadar Pro



via Hosting & Support

Comments

Popular posts from this blog

Microsoft, Google, and Meta have borrowed EV tech for the next big thing in data centers: 1MW watercooled racks

Web Hosting & Remote IT Support Liquid cooling isn't optional anymore, it's the only way to survive AI's thermal onslaught The jump to 400VDC borrows heavily from electric vehicle supply chains and design logic Google’s TPU supercomputers now run at gigawatt scale with 99.999% uptime As demand for artificial intelligence workloads intensifies, the physical infrastructure of data centers is undergoing rapid and radical transformation. The likes of Google, Microsoft, and Meta are now drawing on technologies initially developed for electric vehicles (EVs), particularly 400VDC systems, to address the dual challenges of high-density power delivery and thermal management. The emerging vision is of data center racks capable of delivering up to 1 megawatt of power, paired with liquid cooling systems engineered to manage the resulting heat. Borrowing EV technology for data center evolution The shift to 400VDC power distribution marks a decisive break from legacy sy...

Google’s AI Mode can explain what you’re seeing even if you can’t

Web Hosting & Remote IT Support Google’s AI Mode now lets users upload images and photos to go with text queries The feature combines Google Gemini and Lens AI Mode can understand entire scenes, not just objects Google is adding a new dimension to its experimental AI Mode by connecting Google Lens's visual abilities with Gemini . AI Mode is a part of Google Search that can break down complex topics, compare options, and suggest follow-ups. Now, that search includes uploaded images and photos taken on your smartphone. The result is a way to search through images the way you would text but with much more complex and detailed answers than just putting a picture into reverse image search. You can literally snap a photo of a weird-looking kitchen tool and ask, “What is this, and how do I use it?” and get a helpful answer, complete with shopping links and YouTube demos. AI Eyes If you take a picture of a bookshelf, a plate of food, or the chaotic interior of your junk...

Passing the torch to a new era of open source technology

Web Hosting & Remote IT Support The practice of developing publicly accessible technologies and preventing monopolies of privately-owned, closed-source infrastructure was a pivotal technological movement in the 1990s and 2000s. The open source software movement was viewed at the time as a form of ‘digital civil duty’, democratizing access to technology. However, while the movement's ethos underpins much of today’s technological landscape, its evolution has proven to be a challenge for its pioneers. Hurdles Facing Young Developers Open source models successfully paved a path for the development of a multitude of technologies, cultivating a culture of knowledge sharing, collaboration , and community along the way. Unfortunately, monetizing such projects has always been a challenge, and ensuring contributors are compensated for their contributions working on them, even more so. On the other hand, closed-source projects offer greater control, security, and competitive advant...