Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

107 minutesIntermediateAI for Business

Lenny's Podcast. Hamel Husain and Shreya Shankar walk through the entire eval workflow on a real property-management AI assistant — looking at traces, open and axial coding of errors, deciding when to stop, building an LLM-as-judge, and validating it against human judgment. This is the rare long-form conversation that is genuinely aimed at PMs and team leads rather than ML engineers, and it covers the same "30 minutes a week after setup" rhythm the article recommends.

AI Expert note

This remains one of the best non-engineer eval explanations because it focuses on workflow discipline rather than a specific tool. Keep the method, then choose tooling that fits your privacy and observability constraints.

What you should get from this

Learn the product-builder eval loop: inspect traces, label failures, define criteria, test changes and compare against human judgment.

Watch or know first

Have at least one AI workflow where quality can get better or worse over time.

Watch next

Continue through the same learning path with the next curated companion videos.

RAG Agents in Prod: 10 Lessons We Learned — Douwe Kiela, creator of RAG

Understand what breaks when RAG moves into regulated, high-stakes enterprise use.

RAG vs. Fine Tuning

Explain when retrieval is the right fix and when fine-tuning may actually help.

Anthropic's Claude Computer Use Is A Game Changer | YC Decoded

Decide where browser or computer-use agents might be commercially useful despite their operational risk.

Related videos

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings

How to Build Human-Centered AI Workflows in Localization with Shashi Bhushan

From Hype to Habit: How Tech Companies Are Scaling AI Beyond the Experimental

Private AI vs. Cloud: How Enterprise Leaders Can Make Smarter Build-or-Buy Decisions

Take it further

Hand-picked external courses that go deeper on this topic.

Coursera · DeepLearning.AI

AI for Everyone

Six years after it launched, still the cleanest starting point for anyone who needs to understand AI without learning to code. No math, no jargon, no hype — you'll finish able to have an informed conversation about AI projects.

New to AI~6 hoursVerified 25 days ago

Coursera · The Wharton School

AI Strategy and Governance

Kartik Hosanagar · Kevin Werbach · Prasanna Tambe · Lynn Wu

Wharton's rigorous framing for executives making build-vs-buy decisions. Cuts through vendor pitches by focusing on the economics of AI deployment, algorithmic bias in hiring and operations, and the governance practices that survive an audit. Best taken before, not after, your next major AI procurement decision.

Advanced~10 hoursVerified 25 days ago

See all courses for AI for Business