About This Investigation

Who & Why

I’m Brandon Geraci, and I was tired of seeing YouTube tech reviewers blame NVIDIA’s DGX Spark hardware for being “slow” or “disappointing” without providing any technical analysis.

So I decided to actually investigate what was going on.

What We Found

After 60 comprehensive benchmarks across 3 different LLM models, we discovered:

Docker containers use 20-30 GB more memory than native execution on Grace Blackwell
KV cache is reduced by 40-63% in containers
Performance is identical - same throughput, no speed penalty
Root cause: Docker’s cgroups double-count unified memory

This isn’t a hardware problem. It’s a software stack mismatch.

The Project

GitHub Repository: benchmark-spark

All code, data, and analysis are open source. We ran:

60 total benchmark runs
3 models: DeepSeek-7B, Qwen-72B, GPT-OSS-120B
2 environments: Native (chroot) vs Container (Docker)
10 iterations per configuration
~14 hours of comprehensive testing

Phase 2: What’s Next

We’re planning a follow-up investigation:

Factory reset the DGX Spark
Create 1:1 bare metal vs container configs
Deep dive into KV cache scaling
Test on discrete GPU systems for comparison

Get Involved

If you’re running Grace Blackwell or other unified memory systems:

Share your findings
Open issues on GitHub
Contribute data or analysis

Let’s understand this together - with engineering, not speculation.

Contact: GitHub
Project: benchmark-spark
Interactive Results: Results Dashboard

Who & Why#

What We Found#

The Project#

Phase 2: What’s Next#

Get Involved#

Who & Why

What We Found

The Project

Phase 2: What’s Next

Get Involved