AI infra

overview of building an NDR InfiniBand network for a GPU cluster:

Network Architecture:
- Use a two-level multi-rail networking topology
- Leaf-spine architecture with ToR (Top of Rack) switches
Key Components:
- GPU Servers: NVIDIA HGX H100 servers, each with 8 x H100 GPUs and 8 x 400G NICs
- ToR Switches: 400G NDR InfiniBand switches
- Leaf Switches: Higher port count 400G NDR InfiniBand switches
Connectivity:
- Each GPU server connects to ToR switches with 8 x 400G links
- ToR switches connect to Leaf switches with 400G uplinks
- Non-blocking 1:1 subscription ratio between server ports and network ports
Scale:
- Can scale up to 1024 servers / 8192 GPUs in a two-tier network
- Organized into "server pods" of 64 servers each
Cabling:
- Use MPO fiber optic cables between switches
- Use OSFP or QSFP-DD 400G optical transceivers
Software:
- Enable RoCE (RDMA over Converged Ethernet) for low latency
- Configure PFC (Priority Flow Control) and ECN (Explicit Congestion Notification)
- Use BGP for routing between switches
Management:
- Implement a separate management network
- Use network management software for monitoring and analytics
Testing:
- Validate performance with tools like PerfTest and NCCLTest
- Test congestion control and overall cluster throughput
Power and Cooling:
- Plan for high power density racks
- Ensure adequate cooling for GPU servers and switches

The key is to create a high-bandwidth, low-latency fabric optimized for GPU-to-GPU communication using NDR InfiniBand technology. Careful planning of the network topology, cabling, and configuration is essential for maximum performance.

Would you like me to elaborate on any specific aspect of building this network?

Google Sites

Report abuse