Want a PC with 8 (yes, 8) AMD Radeon RX 7900 XTX GPUs? Here’s one and OMG, you could even add Intel Arc GPUs

Comino made the headlines with the launch of Grando, its water-cooled AMD-based workstation with eight Nvidia RTX 5090 GPUs. During the extensive email exchange I had with its CTO/co-founder and commercial director, I found out Grando is far more versatile than I’d come to expect.

Dig in its configurator and one will notice that you can configure the system with up to eight RX 7900 XTX GPUs because, why not?

“Yes, we can pack 8x 7900XTX, with an increased lead time though. In fact, we can pack any 8 GPUs + EPYC in a single system”, Alexey Chistov, the CTO of Comino, told me when I queried further.

Indeed, while it doesn’t currently offer Intel’s promising Arc GPU, it will if the market demands such solutions.

“We can design a waterblock for any GPU, it takes around a month” Chistov highlighted, “But we don't go for all possible GPUs, we choose specific models and brands. We only go for high-end GPUs to justify the extra price for liquid-cooling, because if it could properly work air-cooled - why bother? We try to stick with 1 or 2 different models per generation not to have multiple SKUs (stock keeping units) of waterblocks. You can have an RTX 4090, H200, L40S or any other GPU that we have a waterblock for in a single system if your workflow will benefit from such a combination.”

An RTX 5090 on its retail packaging on a desk (Image credit: Future)

The Rimac of HPC

So how can Comino achieve such flexibility? The company pitches itself as an engineering company with its slogan proudly saying "Engineered, not just assembled". Think of Comino as the Rimac of HPC: obscenely powerful, nimble, agile and expensive. Like Rimac, it focuses on the apex of its line of business and absolute performance.

Its flagship product, Grando, is liquid-cooled and was designed to accommodate up to eight GPUs from the onset, which means that it will very likely be futureproof for multiple Nvidia generations; more on that in a bit.

One of their main targets, Chistov, told me, “is to always fit a single PCI slot, that's how we can populate all the PCIe slots on the motherboard and fit eight GPUs in a GRANDO Server. The chassis is also designed by the Comino team so everything works as "one”. That’s how a triple-slot GPU like the RTX 5090 can be modified to fit into a single slot.

With that in mind, it is preparing a “solution capable of operating on the coolant temperature of 50C without throttling, so if you drop the coolant temperature to 20C and set the coolant flow to 3-4 l/m the waterblock can remove around 1800W of the heat from the 5090 chip with the chip temperature around 80-90C”

That’s right, one single Comino GPU waterblock could remove 1800W of heat from a single "hypothetical 5090" that could generate that amount of heat IF the coolant temperature on the inlet is around 20 degrees Celsius AND if the coolant flow is not less than 3-4 liters per minute.

Packing eight of such "hypothetical GPUs" and some other components could lead to a total system power draw of 15 kW and indeed if such a system at full load would have a constant coolant temperature of 20C AND coolant flow per waterblock not less than 3-4 liters per minute, such system would operate "normally".

Who will need that sort of performance?

So what sort of user splashes out on multi-GPU systems. Chistov, again. “There is no benefit to adding an additional 5090 if you are a gamer, this won't affect performance, because games can't utilize multiple GPUs like they used to using SLI or even DirectX at some point of time. There are several applications we are focused on for multi-GPU systems:

AI Inference: this is the most demanded workload. In such a scenario each GPU works "alone" and the reason to pack more GPUs per node is to decrease "cost per GPU" while scaling: save rack space, spend less money for non-GPU hardware, etc. Each GPU in a system is used to process AI requests, mostly generative AI, for example, Stable Diffusion, Midjourney, DALL-E
GPU Rendering: popular workload, but does not always scale well adding more GPUs, for example Octane and V-Ray (~15% less performance per GPU @ 8-GPUs) scale pretty well, but RedShift does not (~35-40% less performance per GPU @ 8-GPUs)
Life-Science: different types of scientific calculations, foк example CryoSPARK or Relion.
Any GPU-bound workload in a virtualized environment. Using Hyper-V or other software you can create multiple Virtual Machines to run any task, for example, remote workstation. Like StorageReview did with the Grando and six RTX 4090 GPUs it had on a review.

Specifically for the RTX 5090, the most important improvement for AI workloads is the 50% improvement in memory capacity (up to 32GB) which means that Nvidia’s new flagship is better suited for inference as you can put a far bigger AI model in memory. Then there’s the far higher memory bandwidth which helps as well.

In his review of the RTX 5090, TechRadar’s John Loeffler calls it the supercar of graphics cards, and asks whether it was simply too powerful, suggesting that it is an absolute glutton for wattage.

“It's overkill”, he quips, “especially if you only want it for gaming, since monitors that can truly handle the frames this GPU can put out are likely years away.”

via Hosting & Support

Remote IT Support

Search This Blog