Hey guys,
What’s currently the best LLM for low-VRAM machines with only 6 GB VRAM? I’ve got 32GB RAM as well.
I’m experimenting a little with SillyTavern and I’m curious which model gets the most out of my setup. Should be multilingual and suitable for “casual chatting”.
I know I will probably not get very far with this, but I’m still interested in how far we’ve already come.
(Using KoboldCPP if that matters).
~sp3ctre


On my MacBook Air m2, I’m currently using Qwen 3.5 4b with 8 bit quantisation, and even at its maximum context length, multiple web search RAGs, and the model being built for vision and reasoning, it only ever hits 4.3gb of memory tops.
I run it though LM Studio, so paired with the fact it’s a Mac, your mileage may vary in terms of how much memory it uses, but it does have [from my experience] an output quality a bit over ChatGPT 4o, and is actually really solid for research purposes if that’s what you’re looking for.