Who & Why

I’m Brandon Geraci, and I was tired of seeing YouTube tech reviewers blame NVIDIA’s DGX Spark hardware for being “slow” or “disappointing” without providing any technical analysis.

So I decided to actually investigate what was going on.

What We Found

After 60 comprehensive benchmarks across 3 different LLM models, we discovered:

  • Docker containers use 20-30 GB more memory than native execution on Grace Blackwell
  • KV cache is reduced by 40-63% in containers
  • Performance is identical - same throughput, no speed penalty
  • Root cause: Docker’s cgroups double-count unified memory

This isn’t a hardware problem. It’s a software stack mismatch.

The Project

GitHub Repository: benchmark-spark

All code, data, and analysis are open source. We ran:

  • 60 total benchmark runs
  • 3 models: DeepSeek-7B, Qwen-72B, GPT-OSS-120B
  • 2 environments: Native (chroot) vs Container (Docker)
  • 10 iterations per configuration
  • ~14 hours of comprehensive testing

Phase 2: What’s Next

We’re planning a follow-up investigation:

  • Factory reset the DGX Spark
  • Create 1:1 bare metal vs container configs
  • Deep dive into KV cache scaling
  • Test on discrete GPU systems for comparison

Get Involved

If you’re running Grace Blackwell or other unified memory systems:

  • Share your findings
  • Open issues on GitHub
  • Contribute data or analysis

Let’s understand this together - with engineering, not speculation.


Contact: GitHub
Project: benchmark-spark
Interactive Results: Results Dashboard