Published by

Rich Tong

Peter Kukol asked. Great question from months back. How do you run big models like DeepSeek AI or Meta Llama4 on consumer hardware? Should you go big on CPU DRAM or not?

Long answer but…

Yes, that’s what most people are doing:g buying a big CPU and lots of RAM and a few big Nvidia processors (depends on how insane you are)

The main issue is that DeepSeek is a “dense” model, so it needs all 600GB of VRAM to run (plus space for context

The latest technology is using a mixture of experts. Above 30M parameters or so, so the configuration ur talking about works way better.

For instance, a version of Qwen3 is a 30B parameter model, but only 3B are active at once. That makes it way more practical to have a common GPU/CPU system.

And it looks like the right option about 30B is MoE. (Basically, there’s a network in front that decides which of N dense models to run)

Llama4 Scout, for instance, does this.

The other approach is to use quantization. Studies show you lost about 1-3% accuracy at 4-bit quantization (a huge topic, but it’s a hybrid. Some things are 4-bit, others are 8)

Finally, there is distillation, which can take a 670B dense model down to 70B, which does fit in a single big GPU. Performance is also relatively good.

Net, net, you don’t need as much RAM if you’re doing inference mainly, and 24-80GB VRAM for a lot of ordinary mortals is plenty.

I’m Rich & Co.

Welcome to Tongfamily, our cozy corner of the internet dedicated to all things technology and interesting. Here, we invite you to join us on a journey of tips, tricks, and traps. Let’s get geeky!

Let’s connect

Join the nerds!

Stay updated with our latest tutorials and ideas by joining our newsletter.

Tongfamily (Richtong.com Sync)

ai: Local Builds and how much VRAM

Like this:

Leave a ReplyCancel reply

I’m Rich & Co.

Let’s connect

Join the nerds!

Recent posts

ai: Open WebUI -1 problems, more ram, $200 plans and ComfyUI URLs

sw: Passport photo workflow redux

cell: At&T Hotspot strangeness and the magic of APN profiles and “dun”

edc: Setting Time on Casio Promaster C660

ai: Local Builds and how much VRAM

ai: Best Local Image Models July 2025 and Comfy Missing Manual Addendum

Share this:

Like this:

Leave a ReplyCancel reply

I’m Rich & Co.

Let’s connect

Join the nerds!

Recent posts