Your best local LLM for low-VRAM (6GB)?

sp3ctre@feddit.org · 15 days ago

Your best local LLM for low-VRAM (6GB)?

Rhaedas@fedia.io · 14 days ago

There’s been a few videos on Youtube lately discussing using a particular Qwen model that lets you load only particular expert sections at a time onto the GPU and the rest in RAM. This one was the first I watched (https://www.youtube.com/watch?v=8F_5pdcD3HY), I haven’t tried it, but it makes sense on why it would work.