We recently got a glimpse of what billion-dollar AI GPUs look like when Elon Musk shared a short video tour of Cortex, Tesla's AI training supercomputer currently under construction at Tesla's Giga Texas plant.
Most recently, Musk took to his social media platform to announce that Colossus, a new 100k H100 training group, is now up and running.
Musk claims that Colossus is “the most powerful AI training system in the world” and that it was built “from start to finish” in just 122 days. That’s quite an achievement. The servers for the xAI cluster were reportedly provided by Dell and Supermicro, and the cost of the project is estimated to be between $3 and $4 billion.
This weekend, the @xAI team spun up our Colossus 100k H100 training cluster. From start to finish, it took 122 days. Colossus is the most powerful AI training system in the world. And it will double in size to 200k (50k H200) in a few months. Awesome…September 2, 2024
Where does Colossus get its name?
Tom's Hardware Store Notes: “While all of these clusters are formally operational and even training AI models, it’s not entirely clear how many are actually online today. First, it takes some time to debug and optimize the configuration of those superclusters. Second, X needs to make sure they get enough power, and while Elon Musk’s company has been using 14 diesel generators to power its Memphis supercomputer, they still weren’t enough to power all 100,000 H100 GPUs.”
The Colossus system is about to double in capacity, with plans to add an additional 100,000 GPUs: 50,000 H100 units and 50,000 of Nvidia’s next-generation H200 chips. The supercluster will primarily be used to train xAI’s Grok-3, the company’s latest and most advanced AI model. We haven’t seen any mention of storage for the new system yet, but it’ll need to be massive.
However, the name of the new supercomputer has raised more than a few eyebrows, as it is a 1970s science fiction film (based on a 1966 novel by DF Jones) about a supercomputer that becomes conscious after being given control of the US nuclear arsenal. Unsurprisingly, things go horribly wrong for humanity.
Both the novel and the film explore current themes such as AI autonomy, the dangers of ceding control to machines, and the ethical implications of artificial intelligence. It’s possible that Musk wasn’t aware of this when the name for his new AI training system was chosen, and that it was simply selected to emphasize the sheer scale of the supercluster. Then again, with Musk’s track record, it wouldn’t be surprising if the reference was entirely intentional – he knows exactly what he’s doing.
More from TechRadar Pro