Topic

Private, Local & Self-hosted AI

Local models, private deployment patterns, self-hosted inference, and hybrid architectures.

10 stories (4 articles · 6 videos)

Start here

A few good first pieces before you browse the full feed.

More in this topic

37 minutes
Video

VMware Private AI Foundation Capabilities and Features Update from Broadcom

Tech Field Day. Shows private AI as layered infrastructure: controlled compute, isolated environments, Kubernetes, inference containers, model governance, self-service provisioning, GPU sharing and monitoring. That maps directly to the article's warning that privacy depends on deployment boundaries, logs, access and operations, not on the word "local."
Advanced
13 min read
Article

Fine-tuning in 2026: when LoRA beats RAG, and how to do it without a cluster

LoRA fine-tuning has become accessible — you can run real fine-tunes on a laptop or rent a GPU for an hour. The patterns that work, the cases where fine-tuning beats RAG, and a practical end-to-end workflow from data prep to deployment.

Evaluate the implementation pattern, failure modes, and guardrails before building.

Advanced
32 minutes
Video

Fast LLM Serving with vLLM and PagedAttention

Anyscale. Walks through why naive LLM serving wastes 60–80% of GPU memory, how PagedAttention borrows OS-style paging to fix that, and why continuous batching produces the 24× throughput numbers the article uses in its math. After this, the article's "you'll be lucky to hit 50% utilisation" line stops feeling abstract.
Advanced
59 minutes
Video

Developing an LLM: Building, Training, Finetuning

Sebastian Raschka. Sebastian Raschka's slower walkthrough of where fine-tuning sits in the broader LLM training pipeline — instruction tuning, classification fine-tuning, parameter-efficient methods, and the trade-offs the article calls out before recommending LoRA. Good calibration before you start, especially if your team is debating whether fine-tuning is even the right step.
Advanced
157 minutes
Video

Fine Tuning LLM Models – Generative AI Course

freeCodeCamp.org. Long, theory-then-code course covering quantisation, LoRA, QLoRA, and full PEFT on Llama 2 and Gemma — on hardware most developers actually have. It is the closest thing to a "shadow somebody who has done this" experience on YouTube and lines up with the article's "you don't need a cluster" claim with concrete VRAM budgets.
Advanced
6 minutes
Video

LM Studio Tutorial: Run Large Language Models (LLM) on Your Laptop

Kevin Stratvert. Same workflow as Ollama but in a GUI: download LM Studio, pull a Llama or Gemma model, chat, drop a PDF in and ask questions about it. Good for readers who'd rather not live in the terminal — also useful for getting a feel for how a 1B–3B model actually performs against a heavier one.
Intermediate
14 minutes
Video

Learn Ollama in 15 Minutes - Run LLM Models Locally for FREE

Tech With Tim. A tight, no-nonsense Ollama walkthrough — install, pull a model, chat, then poke at the local HTTP API from Python and create a custom model with a Modelfile. Covers exactly the workflow the article describes for daily use on a Mac, including how to think about model size vs. your machine's RAM.
Intermediate