Another reason to self host your own AI

SuspiciousCarrot78@aussie.zone · edit-2 35 minutes ago

I have a RPI 4b and 3 lenovos (m93p, m710q, p330).

You can’t beat the RPI for power draw (~2w idle and ~7w under max load) but I suspect if you wanted to look at $ to utility measure you’d probably prefer the Lenovo M93P. $50 USD. Mine has i7-4785t, 16GB ddr3 (2x8iirc?) with ethernet, USB etc. Bought 2023/4. I expect base model is still that price now (mines upgraded). The only caveat is that it doesn’t have HDMI, it has display port out, but that’s just a $5 dongle or SSH issue. M73 would be a touch cheaper.

Iirc the TDP is 35w max and can be lowered / undervolted a touch (don’t update the BIOS - it blocks throtlestop).

I turned mine into a retro PC slash game server for the kids (luanti etc). But the siren call of doing truly impossible things with the RPI is too beguiling :)

Eg: running diet pi (headless) with all of my services (media stack, privacy, docs, search, images etc) takes about 300 megabytes (or 650mb if I have to boot into xfce).

300mb, 2-3w.

That shouldn’t be possible. I love it.

My next goal is to create an expert system / pseudo llm that sources answers based on user provided markdown or PDF, ZIM files and 4get search or Tavily.

The advantage here is that 1) speed will be stupid fast as no neural network crap (outside of optional extra Markov chain garnish) 2) not stochastic (but allow for llm as optional “plug in module” - pi might actually run a 135M at non glacial speeds) 3) still serves openAI compat endpoint.

SuspiciousCarrot78@aussie.zone · edit-2 1 hour ago

Is that the right site or am I not seeing it? Your link points to this -

https://idlewatt.foundagent.net/ Lookup Categories Compare Vendors AI Data Watch Methodology Will this vendor sign a HIPAA BAA? A cited, date-stamped answer for 105 major SaaS tools — can you sign a Business Associate Agreement and store PHI? Built for digital-health teams during vendor procurement.

SuspiciousCarrot78@aussie.zone · 2 days ago

Respectfully, that’s not really how local LLMs work.

A GGUF model sitting on my hard drive has no ability to “send content back home” any more than a PDF or a JPEG does. If you’re running something like llama.cpp or Ollama entirely locally, the model weights are just data files.

The real privacy concerns are cloud APIs, telemetry in front-ends, browser extensions, analytics, update services, or accidentally exposing a service to the public internet.

“Self-hosted AI” isn’t one thing. There’s a huge difference between:

Running ChatGPT through an API
Running a commercial AI appliance
Running a local Qwen/Mistral/Llama model on your own hardware

Firewalling internet-facing services is good advice. Assuming every local model is secretly uploading prompts is not.

SuspiciousCarrot78@aussie.zone · edit-2 3 days ago

Hmm…it runs on a 1060…it’s a MoE not a dense. 24B is even lighter. Worth a shot.

https://www.youtube.com/watch?v=8F_5pdcD3HY

Else, if youre looking for a coding model (??) something like Sara or fara might suit

https://huggingface.co/microsoft/Fara-7B

SuspiciousCarrot78@aussie.zone · edit-2 3 days ago

I mean…that entirely depends on your use case - and I hate saying that. For me and what I do, Qwen SLM (esp Qwen3-4B 2507 instruct and Qwen3.5-2B) are exceptional. But I’m not trying to do Claude at home.

Best bet? Spend $10 on OpenRouter and try different models. In a head to head with ChatGPT 5.4 mini (excellent for coding BTW), I’ve found Qwen 3.5 27B more than able to hold its own for coding tasks…IF you narrowly gate it/confine it. The last batch of Qwen’s really are something. Dunno about the 3.7 series.

Having said ALL that, I’m really tempted to go back in time and code myself a deterministic expert system, with user updatable knowledge cascade, tool calling and a minimal amount of Markov chain word garnish for flavour. I think we use to just call that “a program” lol.

Really tempted actually, because if 50% of llm use case is basically Super Google but not shit…well, I can make that myself. I just need to point my autism at it.

PS: this might help

https://www.youtube.com/watch?v=0AqpaFm11oI

SuspiciousCarrot78@aussie.zone · edit-2 4 days ago

Numbers about 3-4x. The P100 is near 800 GB/s. The 1080 is what… 192GB/s? Hell, even if it were double that, HBM2 simply has larger bandwidth. The 1080 was a gaming card; the P100 is a server / number cruncher.

SuspiciousCarrot78@aussie.zone · edit-2 4 days ago

Just for sake of completion

https://piwigo.org/

Pros

Mature project (around since the early 2000s)

Lightweight compared to Immich

Designed as a photo library first, not an AI platform

Albums, tags, metadata, permissions

Huge plugin ecosystem

Runs happily on modest hardware

Can manage very large collections

Doesn’t demand phone-app-centric workflows (though of course it has a phone to computer app / sync)

Cons

Feels more like a traditional photo archive than Google Photos

Mobile experience is functional rather than slick

No fancy AI search or face recognition by default (though can add easy enough)

UI is a bit “classic web”

SuspiciousCarrot78@aussie.zone · edit-2 4 days ago

Huh - cheaper than the P40s (though less VRAM) but larger bandwidth due to HBM2. Good looking out

SuspiciousCarrot78@aussie.zone · edit-2 5 days ago

Good tips - thanks!

PS: sad to report the 24GB Tesla p40s are now around $250 USD on eBay, so not quite as cheap as I remembered. P4s are still cheap tho, though frankly if you’re going that end of town, a 1080 is about on par, less fussy and probably cheaper - it just won’t fit in a uSFF.

SuspiciousCarrot78@aussie.zone · 5 days ago

You probably could. A Tesla P4 or P40 (old data centre cards) are more than up to the job. My Lenovo tiny hosts a P4 (card cost $100 on eBay; the lenovo itself was $200ish) and runs Qwen3.5-35B-A3B at about 20 tok/s. Smaller models are even faster.

https://www.youtube.com/watch?v=8F_5pdcD3HY

If you’re not bound by the one liter shoebox design, then the P40 is still a great and inexpensive card.

I think I mentioned elsewhere but right now I’m trying to figure out if I can use a magic packet from the Raspberry Pi to wake up the Lenovo as needed rather than leaving it on all the time.

SuspiciousCarrot78@aussie.zone · edit-2 5 days ago

Agree. I know the Pi’s are out of favour these days…but they are a cool little machine. I got mine running DietPi and a bunch o crap (the usuals - JF, arr stack, pi hole, syncthing, yadda yadda) and running headless the footprint (power and memory wise) is tiny.

I joked about the 4xAA batteries thing but iirc, there is actually a Pi-HAT that creates a micro UPS that’ll run the pi for maybe three to five hours just on double A batteries.

Edit: yep

https://pimodules.com/product/ups-pico-hv4-0-advanced

or more sensibly

https://littlebirdelectronics.com.au/collections/raspberry-pi-power-hats/products/raspberry-pi-ups-hat

SuspiciousCarrot78@aussie.zone · edit-2 5 days ago

Agree. And re small models - very agree. In fact I made a ablated version of Qwen 3.5-2B for use with my pi, before thinking a bit harder and realising I can probably code something bespoke that doesn’t need a stochastic parrot as a squwake box at all.

https://huggingface.co/BobbyLLM/polaris-heretic-Q4_K_M-GGUF

Still, as a SLM, it’s perfectly cromulent and does well with tool calling etc which is what I wanted it for.

SuspiciousCarrot78@aussie.zone · 5 days ago

There’s an argument to be had regarding a MoE versus a small dense model. I guess it depends on what exactly you need doing with it. I would be tempted to run a smaller dense model (like a Qwen 3-14B or a Qwen 3.5 9B) as at a reasonable quant, it might fit mostly or entirely on the GPU, thereby giving you excellent speeds.

PS: I’m actually in the process of designing an expert system (not a LLM) for pretty much the task you described. The intention is that you would still interact with it like a large language model, but the actual brains underneath it would be something more traditional.

SuspiciousCarrot78@aussie.zone · 5 days ago

Which trackers if you don’t mind saying? DM me if easier.

SuspiciousCarrot78@aussie.zone · edit-2 5 days ago

Another reason to self host your own AI

SuspiciousCarrot78@aussie.zone · edit-2 5 days ago

Yep. But that would be 100% CPU, 100% of the time? Real life, it’s probably closer to 2w idle and maybe 5-7W under typical load.

More interesting…I think that technically means you could make a “UPS” for it using what…4xAA batteries?

Oh man…that would be cool. Stupid but cool.

SuspiciousCarrot78@aussie.zone · 5 days ago

They were, I think. Or we were just younger.

SuspiciousCarrot78@aussie.zone · 5 days ago

Yeah, same. Though at 3-5W … it really is just a very rough guess. Lemme ShitGPT it. Oh, I was way off

A realistic Pi 4B-only estimate is about A$8–A$12 per year in electricity, assuming it is on 24/7 and used for Jellyfin streaming around 10–12 hours per week.

Pi 4B measurements are typically around 2.7–2.85 W at idle, about 5.1 W under moderate server load, and around 6.4 W under full CPU stress. Using Perth/WA’s Synergy Home Plan A1 energy charge of 32.3719 c/kWh, excluding the daily supply charge, that works out very cheaply because the device uses only about 25–36 kWh/year.

Scenario Assumed usage Annual energy Approx. annual cost

Mostly idle 3 W 24/7 26.3 kWh A$8.51/year Idle + 12h/wk Jellyfin 2.7 W idle, 5.1 W streaming 25.1 kWh A$8.14/year Heavier Jellyfin/server use 2.7 W idle, 6.4 W streaming 26.0 kWh A$8.40/year Conservative wall-power estimate 4 W idle, 6.4 W streaming 36.5 kWh A$11.83/year

The bigger swing factor is storage, not the Pi. A USB SSD adds very little; a USB-powered 2.5" hard drive might add a few dollars per year; a powered 3.5" external drive left spinning 24/7 could push the total more into the A$15–A$30/year range.

So, for the Raspberry Pi 4B itself as a Jellyfin box: roughly A$10/year is a good mental estimate.

SuspiciousCarrot78@aussie.zone · 5 days ago

I remember it being a touch more …analog…back in the day. ATDT commands and all.

But yeah, Win 3.11+ trumpet winsock and Free Agent were the shit. Rec.martial.arts was home back then (along with mIRC).

Lemmy reminds me a bit of the old Usenet fora.

SuspiciousCarrot78@aussie.zone · 5 days ago

Torrent cache? As in seedbox?

SuspiciousCarrot78@aussie.zone · edit-2 5 days ago

Use to last me 2-3 months… but my media library is more or less complete now, with little churn. Also, I don’t ever go above 1080p.

I need to check if Radarr / Sonarr works with straight torrents (it must do; I haven’t used them for ages / have been using 1337 manually, but I seem to recall torrents being a source).

SuspiciousCarrot78@aussie.zone · edit-2 15 days ago

Claude? No. Cucumbers? Yes!

SuspiciousCarrot78@aussie.zone · edit-2 18 days ago

"The cost of running LLMs is just too damn high"

SuspiciousCarrot78@aussie.zone · edit-2 19 days ago

Another reason to self host your own AI

Another reason to self host your own AI

Claude? No. Cucumbers? Yes!

Claude? No. Cucumbers? Yes!

"The cost of running LLMs is just too damn high"

"The cost of running LLMs is just too damn high"

Token Speed visualiser

Token Speed visualiser