Joerg Hiller
Aug 29, 2024 07:18
NVIDIA’s Blackwell structure units new benchmarks in MLPerf Inference v4.1, showcasing vital efficiency enhancements in LLM inference.
NVIDIA’s new Blackwell structure has set unprecedented benchmarks within the newest MLPerf Inference v4.1, based on the NVIDIA Technical Weblog. The platform, launched at NVIDIA GTC 2024, incorporates a superchip primarily based on 208 billion transistors and employs the TSMC 4NP course of tailor-made for NVIDIA, making it the most important GPU ever constructed.
NVIDIA Blackwell Shines in MLPerf Inference Debut
In its inaugural spherical of MLPerf Inference submissions, NVIDIA’s Blackwell structure delivered outstanding outcomes on the Llama 2 70B LLM benchmark, attaining as much as 4x larger tokens per second per GPU in comparison with the earlier H100 GPU. This efficiency leap was facilitated by the brand new second-generation Transformer Engine, which leverages Blackwell Tensor Core know-how and TensorRT-LLM improvements.
In keeping with the MLPerf outcomes, Blackwell’s FP4 Transformer Engine managed to execute roughly 50% of the workload in FP4, reaching a delivered math throughput of 5.2 petaflops. The Blackwell-based submissions have been within the closed division, which means the fashions have been unmodified but met excessive accuracy requirements.
NVIDIA H200 Tensor Core GPU’s Excellent Efficiency
The NVIDIA H200 GPU, an improve to the Hopper structure, additionally delivered distinctive outcomes throughout all benchmarks. The H200, geared up with HBM3e reminiscence, confirmed vital enhancements in reminiscence capability and bandwidth, benefiting memory-sensitive functions.
For instance, the H200 achieved notable efficiency positive aspects on the Llama 2 70B benchmark, with a 14% enchancment over the earlier spherical, purely by software program enhancements in TensorRT-LLM. Moreover, the H200’s efficiency surged by 12% when its thermal design energy (TDP) was elevated to 1,000 watts.
Jetson AGX Orin’s Large Leap in Edge AI
NVIDIA’s Jetson AGX Orin demonstrated spectacular efficiency enhancements in generative AI on the edge, attaining as much as 6.2x extra throughput and a pair of.4x higher latency on the GPT-J 6B parameter LLM benchmark. This was made attainable by quite a few software program optimizations, together with the usage of INT4 Activation-aware Weight Quantization (AWQ) and in-flight batching.
The Jetson AGX Orin platform is uniquely positioned to run advanced fashions like GPT-J, imaginative and prescient transformers, and Steady Diffusion on the edge, offering real-time, actionable insights from sensor information similar to photographs and movies.
Conclusion
In abstract, NVIDIA’s Blackwell structure has set new requirements in MLPerf Inference v4.1, attaining as much as 4x the efficiency of its predecessor, the H100. The H200 GPU continues to ship top-tier efficiency throughout a number of benchmarks, whereas Jetson AGX Orin showcases vital developments in edge AI.
NVIDIA’s steady innovation throughout the know-how stack ensures it stays on the forefront of AI inference efficiency, from large-scale information facilities to low-power edge units.
Picture supply: Shutterstock