A 6-part series investigating why Docker containers use 20-30GB more memory than native execution on Grace Blackwell’s unified memory architecture.

Overview

When YouTube reviewers complained about DGX Spark performance, they blamed the hardware. I dug deeper and found the real culprit: Docker’s cgroups double-counting unified memory.

Key Findings

The Series

  1. The Mystery - YouTubers blamed NVIDIA hardware without technical analysis
  2. MPI and Chroot Nightmare - Setting up proper test environments
  3. The Unified Memory Revelation - Why Docker’s cgroups double-count unified memory
  4. The Data: 60 Runs Don’t Lie - Comprehensive benchmark results across 3 models
  5. What I Learned (And What’s Next) - Conclusions and Phase 2 preview
  6. KV Cache Deep Dive - The 2x reduction mystery explained

Impact

This investigation demonstrates the importance of understanding the full stack - from hardware architecture to kernel subsystems to container runtimes. What looked like a hardware problem was actually a software architecture mismatch.