CloudMatrix 384 | GB300 NVL72 | GB200 NVL72 | GB200 Superchip | HGX B300 | HGX B200 | HGX H200 | HGX H100 | |||
Form Factor | 384 Ascend 910C NPUs 192 Kunpeng CPUs | 72 Blackwell GPUs 36 Grace CPUs | 72 Blackwell GPUs 36 Grace CPUs | 2 Blackwell GPUs 1 Grace CPU | 8x Blackwell Ultra SXM | 8x Blackwell SXM | 4x H200 SXM | 8x H200 SXM | 4x H100 SXM | 8x H100 SXM |
FP4 Tensor Core | 1400 PFLOPS | 1440 PFLOPS | 40 PFLOPS | 144 PFLOPS | 144 PFLOPS | |||||
FP8/FP6 Tensor Core | 720 PFLOPS | 720 PFLOPS | 20 PFLOPS | 72 PFLOPS | 72 PFLOPS | 16 PFLOPS | 32 PFLOPS | 16 PFLOPS | 32 PFLOPS | |
INT8 Tensor Core | 23 PFLOPS | 720 PFLOPS | 20 PFLOPS | 72 PFLOPS | 72 PFLOPS | 16 PFLOPS | 32 PFLOPS | 16 PFLOPS | 32 PFLOPS | |
FP16/BF16 Tensor Core | 360 PFLOPS | 360 PFLOPS | 10 PFLOPS | 36 PFLOPS | 36 PFLOPS | 8 PFLOPS | 16 PFLOPS | 8 PFLOPS | 16 PFLOPS | |
TF32 Tensor Core | 180 PFLOPS | 180 PFLOPS | 5 PFLOPS | 18 PFLOPS | 18 PFLOPS | 4 PFLOPS | 8 PFLOPS | 4 PFLOPS | 8 PFLOPS | |
FP32 | 6 PFLOPS | 5760 TFLOPS | 160 TFLOPS | 600 TFLOPS | 600 TFLOPS | 270 TFLOPS | 540 TFLOPS | 270 TFLOPS | 540 TFLOPS | |
FP64 | 100 TFLOPS | 2880 TFLOPS | 80 TFLOPS | 10 TFLOPS | 296 TFLOPS | 140 TFLOPS | 270 TFLOPS | 140 TFLOPS | 270 TFLOPS | |
FP64 Tensor Core | 100 TFLOPS | 2880 TFLOPS | 80 TFLOPS | 10 TFLOPS | 296 TFLOPS | 270 TFLOPS | 540 TFLOPS | 270 TFLOPS | 540 TFLOPS | |
GPU Memory | 49.2 TB | Up to 21 TB | Up to 13.4 TB HBM3e | Up to 372 GB HBM3e | Up to 2.3 TB | 1.4 TB | 564 GB HBM3e | 1.1 TB HBM3e | 320 GB HBM3 | 640 GB HBM3 |
GPU Memory Bandwidth | 1229 TB/s | Up to 576 TB/s | Up to 576TB/s | 16 TB/s | Up to 62TB/s | |||||
GPU Aggregate Bandwidth | 19 GB/s | 38 GB/s | 13 GB/s | 27 GB/s | ||||||
Fast Memory | Up to 40 TB | Up to 30TB | Up to 1.4TB | |||||||
Switch Bus | Unified Bus | NVLink 5 Switch | NVLink 5 Switch | NVLink 5 Switch | NVLink 5 Switch | NVLink 5 Switch | N/A | NVLink 4 Switch | N/A | NVLink 4 Switch |
GPU-to-GPU Bandwidth | >392 GB/s | 1.8 TB/s | 1.8 TB/s | 1.8 TB/s | 1.8 TB/s | 1.8 TB/s | N/A | 900 GB/s | N/A | 900 GB/s |
Total Aggregate Bandwidth | >150 TB/s | 130 TB/s | 130 TB/s | 3.6 TB/s | 14.4 TB/s | 14.4 TB/s | 3.6 TB/s | 7.2 TB/s | 3.6 TB/s | 7.2 TB/s |
Networking Bandwidth | 400 Gbps | 1.6 TB/s | 0.8 TB/s | 0.4 TB/s | 0.8 TB/s | 0.4 TB/s | 0.8 TB/s | |||
Attention Performance | 2X | 1X | ||||||||
CPU Core Count | 2592 Arm Neoverse V2 cores | 2592 Arm Neoverse V2 cores | 72 Arm Neoverse V2 cores | |||||||
CPU Memory | Up to 18 TB SOCAMM with LPDDR5X | Up to 17 TB LPDDR5X | Up to 480 GB LPDDR5X | |||||||
CPU Bandwidth | Up to 14.3 TB/s | Up to 18.4 TB/s | Up to 512 GB/s | |||||||
Power Consumption | 559 kW | 135 kW - 150 kW | 132 kW | ~ 2.7 kW |
AI计算架构规格数据对比
于 2025-08-19 17:32:49 首次发布