Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

107 minutesIntermediateAI for Business

Lenny's Podcast. Hamel Husain and Shreya Shankar walk through the entire eval workflow on a real property-management AI assistant — looking at traces, open and axial coding of errors, deciding when to stop, building an LLM-as-judge, and validating it against human judgment. This is the rare long-form conversation that is genuinely aimed at PMs and team leads rather than ML engineers, and it covers the same "30 minutes a week after setup" rhythm the article recommends.

AI Expert note

This remains one of the best non-engineer eval explanations because it focuses on workflow discipline rather than a specific tool. Keep the method, then choose tooling that fits your privacy and observability constraints.

What you should get from this

Learn the product-builder eval loop: inspect traces, label failures, define criteria, test changes and compare against human judgment.

Watch or know first

Have at least one AI workflow where quality can get better or worse over time.

Watch next

Continue through the same learning path with the next curated companion videos.

Related videos

Take it further

Hand-picked external courses that go deeper on this topic.

See all courses for AI for Business