Videos

Practitioner

Learn workflows for meetings, writing, research, no-code tools, and repeatable business tasks.

Turn AI from a chat box into a dependable work habit.

41 videos1248 min total

Preview first videos

Builder

Go deeper into agents, RAG, MCP, structured outputs, evals, APIs, and local AI.

Evaluate and build AI systems without treating demos as production.

77 videos3073 min total

Preview first videos

Strategic

Cover governance, EU AI Act readiness, build-vs-buy decisions, ROI, and private AI choices.

Make safer AI adoption decisions for a team or company.

14 videos454 min total

Preview first videos

All New to AI Beginner Intermediate Advanced

63 results

Viewing learning path: BuilderShow all

4 minutes

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings

Google for Developers. The video introduces multilingual text embeddings that can run locally and support semantic search and RAG without sending every document to a hosted API. For Estonian companies, that is a useful technical complement to the article's internal-knowledge-search pattern: multilingual retrieval is valuable only when it also respects data locality, permissions and source authority.

Understand why multilingual embeddings matter for private internal search and where local retrieval can reduce data-exposure risk.

IntermediateAI for Business

18 minutes

AWS re:Invent 2025 - Implementing Human-in-the-Loop Controls for Multi-Agent AI Systems (CNS428)

AWS Events. This lightning talk names the business moments where human control is needed: high-stakes decisions, irreversible actions, regulatory requirements, trust-building phases, ambiguous edge cases and graceful degradation. It also shows concrete implementation mechanisms such as MCP elicitations, Step Functions callback waits and approval nodes.

See how approval gates can be implemented as explicit workflow checkpoints rather than informal manual review after something goes wrong.

IntermediateAutomations

7 minutes

Unlock Better RAG & AI Agents with Docling

IBM Technology. Explains the ingestion side of RAG and agents: preparing PDFs and other files so document structure, tables and layout survive into downstream retrieval. That supports the article's warning that RAG quality and safety begin before embedding, especially when parsing complex business documents.

Understand why document parsing, structure preservation and ingestion quality gates matter before building RAG over PDFs and mixed file formats.

Paragon. Walks through the production RAG permission problem and compares tool-calling, namespaces, ACL tables and relationship-based permissions. That directly supports the article's core rule: retrieval must only return sources the current user is allowed to see, and source-system permissions cannot be treated as an afterthought.

20 minutes

Permissions & Access Control for RAG - a Deep Dive Tutorial

Evaluate practical access-control patterns for company knowledge RAG before indexing sensitive internal documents.

Arize AI. Explains why production agents fail when the system lacks the right context, evaluation data, tracing and domain expertise. It maps well to the article's failure-mode register because it makes reliability an engineering loop: separate retrieval from reasoning, define expected outcomes, evaluate tool calls, and trace failures before changing models.

48 minutes

How to Build Reliable AI Agents (Context + Evals Explained) | Tobias Leong, Axium

Design AI workflows around context, evals and observability so production failures can be named, measured and fixed.

Y Combinator. The Lightcone hosts work through why vertical AI agents — not horizontal wrappers — are the defensible shape for application-layer companies, with concrete examples and a clear-eyed take on which categories the model providers will eat. That is the anti-moat trap the article warns about, expressed as a positive playbook.

42 minutes

Vertical AI Agents Could Be 10X Bigger Than SaaS

Assess when vertical AI agents create real defensibility and when they are only thin wrappers.

Sequoia Capital. Bret Taylor walks through the shift from per-seat SaaS to outcomes-based pricing — what to anchor on (resolution, CSAT, NPS), why incumbents struggle to follow, and how vertical specialisation creates pricing power. It directly mirrors the article's pricing and margin sections.

34 minutes

How AI is Reinventing Software Business Models ft. Bret Taylor of Sierra

Evaluate AI product pricing and specialization around measurable outcomes rather than seat counts.

Anyscale. Walks through why naive LLM serving wastes 60–80% of GPU memory, how PagedAttention borrows OS-style paging to fix that, and why continuous batching produces the 24× throughput numbers the article uses in its math. After this, the article's "you'll be lucky to hit 50% utilisation" line stops feeling abstract.

32 minutes

Fast LLM Serving with vLLM and PagedAttention

Understand why serving engines, batching and KV-cache memory dominate self-hosted inference economics.

AdvancedPrivate / Local AI

56 minutes

Build Hour: Prompt Caching

OpenAI. OpenAI's own Build Hour on prompt caching — the 1024-token threshold, the prefix-stability requirement, audio caching at 99% discount for realtime, time-to-first-token impacts at long inputs. Useful when you are sizing the engineering effort to actually hit the cache reliably on your production prompts.

Use prompt caching only when stable prefixes, latency and cost behavior match the workload.

Prompt Engineering. Walks through Anthropic's prompt caching against Gemini's context caching with concrete latency-and-cost reductions per use case (long-document chat, few-shot, multi-turn). The breakdown of cache-write surcharge vs. cache-read discount is exactly what the article assumes when it talks about when caching pays off.

19 minutes

Is This the End of RAG? Anthropic's NEW Prompt Caching

Walks through Anthropic's prompt caching against Gemini's context caching with concrete latency-and-cost reductions per use case (long-document chat, few-shot, multi-turn).

LiveOverflow. Walks through the actual defence-in-depth playbook — taint analysis on LLM output, restricting expected output shapes, user isolation, few-shot scaffolds, fine-tuning, temperature 0 for determinism, redundancy for critical paths. It matches the article's defence-stack section almost item for item.

17 minutes

Defending LLM - Prompt Injection

Review prompt-injection defenses such as taint analysis, output-shape restrictions, user isolation, deterministic settings and redundant checks for critical paths.

LiveOverflow. Frames prompt injection as a classic injection attack against systems that mix instructions and untrusted data — with a concrete content-moderation example where an attacker frames an innocent user. The mental shift from "the model is the target" to "the application is the target" is exactly the move the article opens with.

13 minutes

Attacking LLM - Prompt Injection

Model prompt injection as untrusted-data mixing and design boundaries around tool use.

Fireship. The clearest short explanation on YouTube of the screenshot–action–screenshot loop, including the honest failure modes (Claude wandering off to look at Yellowstone, token burn, latency per step). Fireship is light on production detail by design — read the article for that — but it leaves you with the right intuition for why these systems are expensive and brittle before you commit one to your stack.

5 minutes

Claude has taken control of my computer...

Understand why screenshot-based computer use is powerful, slow, expensive and brittle compared with API-native automation.

Adam Lucek. A longer implementation pass through the cognitive-science-inspired categories — episodic, semantic, working, procedural — wired into an agent in code. Worth watching after the LangChain conceptual video if you want a more opinionated mental model and a working example to crib from.

44 minutes

Building Brain-Like Memory for AI | LLM Agent Memory Systems

A longer implementation pass through the cognitive-science-inspired categories — episodic, semantic, working, procedural — wired into an agent in code.

LangChain. Short, no-code walkthrough of the short-term-vs-long-term split, the three shapes long-term memory tends to take (instructions, profile, list of objects), and the hot-path-versus-background trade-off for when to write. The article's memory-architecture section assumes exactly this taxonomy.

7 minutes

Memory for agents (conceptual video)

Separate short-term and long-term memory decisions and decide when agent memory should be written.

LangChain. Lance Martin's framework — write, select, compress, isolate — with concrete examples of when to summarise an action history, when to offload state to files, and when to spin up sub-agents purely to protect the parent's context. Maps almost directly onto the article's section on managing 1M-token windows in practice.

22 minutes

Context Engineering for Agents

Apply write, select, compress and isolate patterns to manage agent context deliberately.

AdvancedPrompt Engineering

8 minutes

Context Rot: How Increasing Input Tokens Impacts LLM Performance

Chroma. Kelly Hong walking through Chroma's research on 18 models — why needle-in-haystack scores are misleading, how performance degrades with ambiguity and distractors, why even simple string-repetition tasks degrade past 500 tokens. Short, evidence-based, and exactly the case the article needs you to take seriously before getting to the engineering moves.

Understand how long context can fail under ambiguity and distractors, then design tests around that risk.

AdvancedPrompt Engineering

66 minutes

CrewAI Tutorial: Complete Crash Course for Beginners

aiwithbrandon. The same kind of build, but in CrewAI's role-goal-backstory style — agents as team members, tasks as deliverables, the framework hiding the execution loop. Watch it immediately after the LangGraph course; the contrast in how much the framework decides for you is exactly what the article is asking you to weigh.

The same kind of build, but in CrewAI's role-goal-backstory style — agents as team members, tasks as deliverables, the framework hiding the execution loop.

freeCodeCamp.org. A long, code-along build through LangGraph's state graphs, nodes, edges, conditional routing, checkpoints, and tool use. By the end you have enough feel for the typed-state, "every transition is explicit" model that the article's comparison to CrewAI and to direct-API code stops being abstract.

190 minutes

LangGraph Complete Course for Beginners – Complex AI Agents with Python

A long, code-along build through LangGraph's state graphs, nodes, edges, conditional routing, checkpoints, and tool use.

Anthropic. Three Anthropic engineers walking through the most common pitfalls they see — agents that don't know when to stop, over-prompting in the system prompt instead of fixing the environment, the cost of multi-agent designs nobody actually needed. Useful right after Barry's talk; you will recognise the same patterns from a different angle.

18 minutes

Tips for building AI agents

Recognize common agent-building pitfalls before adding multiple agents, complex prompts or hidden state.

AI Engineer. Barry Zhang on three rules — don't build an agent when a workflow would do, keep the loop as simple as possible, and "think like your agent" (sit in its context window and notice that it is making decisions in the dark between screenshots). The simplicity argument and the "is this task even worth an agent" checklist are exactly the discipline the article asks for.

15 minutes

How We Build Effective Agents: Barry Zhang, Anthropic

Design simpler agent loops with clear stopping rules, task boundaries and human control points.

Sebastian Raschka. Sebastian Raschka's slower walkthrough of where fine-tuning sits in the broader LLM training pipeline — instruction tuning, classification fine-tuning, parameter-efficient methods, and the trade-offs the article calls out before recommending LoRA. Good calibration before you start, especially if your team is debating whether fine-tuning is even the right step.

59 minutes

Developing an LLM: Building, Training, Finetuning

Place fine-tuning inside the broader training pipeline and decide when it is better than prompting or RAG.

AdvancedPrivate / Local AI

157 minutes

Fine Tuning LLM Models – Generative AI Course

freeCodeCamp.org. Long, theory-then-code course covering quantisation, LoRA, QLoRA, and full PEFT on Llama 2 and Gemma — on hardware most developers actually have. It is the closest thing to a "shadow somebody who has done this" experience on YouTube and lines up with the article's "you don't need a cluster" claim with concrete VRAM budgets.

Long, theory-then-code course covering quantisation, LoRA, QLoRA, and full PEFT on Llama 2 and Gemma — on hardware most developers actually have.

AdvancedPrivate / Local AI

16 minutes

Graph RAG: Improving RAG with Knowledge Graphs

Prompt Engineering. A focused walkthrough of Microsoft's GraphRAG — entity extraction, community summaries, query-focused summarization — set up on a local machine with cost notes. Watch it for the graph-RAG section of the article specifically; the cost discussion is the part most write-ups skip.

Understand the Microsoft-style GraphRAG flow: entity extraction, communities, summaries and query-focused synthesis.