
Microsoft Azure has launched the NDv6 GB300 VM sequence, a brand new digital machine (VM) providing constructed across the NVIDIA GB300 NVL72 platform and Blackwell Extremely graphics processing models (GPUs), now deployed at supercomputing scale.
The newest Azure cluster incorporates greater than 4,600 NVIDIA GB300 NVL72 methods, which Microsoft describes as the primary large-scale manufacturing deployment of this sort.
These clusters are meant for OpenAI’s most computationally intensive AI inference workloads, utilizing NVIDIA’s next-generation InfiniBand networking for interconnects.
Every NDv6 GB300 digital machine runs on a rack-scale Nvidia GB300 NVL72 system, comprising 72 Blackwell Extremely GPUs and 36 Grace CPUs.
This association gives every VM with 37 terabytes (TB) of reminiscence and 1.44 exaflops of FP4 Tensor Core functionality. The system is meant for AI fashions with excessive computational and reminiscence necessities, together with large-scale reasoning fashions and generative AI workloads.
Microsoft Azure AI infrastructure company vp Nidhi Chappell mentioned: “Delivering the business’s first at-scale Nvidia GB300 NVL72 manufacturing cluster for frontier AI is an achievement that goes past highly effective silicon — it displays Microsoft Azure and Nvidia’s shared dedication to optimise all elements of the fashionable AI information centre.
“Our collaboration helps guarantee clients like OpenAI can deploy next-generation infrastructure at unprecedented scale and velocity.”
The cluster makes use of a two-level community design. Inside every rack, fifth-generation Nvidia NVLink Switches allow 130TB per second of bandwidth between the GPUs, making a unified reminiscence house.
For communication throughout racks, the Nvidia Quantum-X800 InfiniBand platform provides 800 gigabits (Gbs) per second of bandwidth to every GPU.
The community additionally options adaptive routing, congestion administration by way of telemetry, and implements Nvidia’s Scalable Hierarchical Aggregation and Discount Protocol (SHARP) v4 for accelerated distributed operations.
Nvidia claimed that the Blackwell Extremely platform has proven as much as 5 occasions greater throughput per GPU on the 671-billion-parameter DeepSeek-R1 mannequin in comparison with the earlier Hopper era, as measured within the MLPerf Inference v5.1 benchmarks.
The GB300 NVL72 additionally outperformed earlier methods on newer benchmarks, equivalent to Llama 3.1 405B, in response to Nvidia.
Microsoft’s implementation required modifications at a number of layers of its information centre stack, together with liquid cooling options, energy distribution methods and orchestration software program.
In September 2025, Microsoft built-in Anthropic’s Claude fashions into Copilot Studio, enhancing its assist for OpenAI’s giant language fashions.

