Meta recently revealed details about the company's AI training infrastructure, revealing that it currently relies on nearly 50,000 Nvidia H100 GPUs to train its open source Llama 3 LLM.
Like many major tech companies involved in AI, Meta wants to reduce its dependence on Nvidia hardware and has taken another step in that direction.
Meta already has its own AI inference accelerator, Meta Training and Inference Accelerator (MTIA), which is designed for the social media giant's internal AI workloads, especially those that improve experiences across its various products. The company has now shared information about its second-generation MTIA, which significantly improves on its predecessor.
Software stack
This revamped version of MTIA, which can handle inference but not training, doubles the compute and memory bandwidth of the previous solution while maintaining a close relationship with Meta workloads. It is designed to efficiently serve ranking and recommendation models that provide suggestions to users. The new chip architecture aims to provide a balanced combination of computing power, memory bandwidth and memory capacity to meet the unique needs of these models. The architecture improves SRAM capacity, enabling high performance even at low batch sizes.
The latest Accelerator consists of an 8×8 processing element (PE) grid that delivers 3.5x dense compute performance and sparse compute performance that is reportedly seven times better than MTIA v1. The advancement comes from optimizations in the new architecture around the sparse computing pipeline, as well as how data is fed into the PEs. Key features include triple the size of local storage, double the on-chip SRAM and a 3.5x increase in bandwidth, and double the LPDDR5 capacity.
Along with hardware, Meta is also focusing on co-engineering the software stack with silicon to synergize an optimal overall inference solution. The company says it has developed a robust rack-based system that supports up to 72 accelerators, designed to clock the chip at 1.35 GHz and run at 90 W.
Among other developments, Meta says it has also improved the structure between accelerators, significantly increasing the bandwidth and scalability of the system. Triton-MTIA, a backend compiler built to generate high-performance code for MTIA hardware, further optimizes the software stack.
The new MTIA won't have a massive impact on Meta's roadmap toward a future less reliant on Nvidia GPUs, but it is another step in that direction.