Lenny's Podcast. Hamel Husain and Shreya Shankar walk through the entire eval workflow on a real property-management AI assistant — looking at traces, open and axial coding of errors, deciding when to stop, building an LLM-as-judge, and validating it against human judgment. This is the rare long-form conversation that is genuinely aimed at PMs and team leads rather than ML engineers, and it covers the same "30 minutes a week after setup" rhythm the article recommends.
This remains one of the best non-engineer eval explanations because it focuses on workflow discipline rather than a specific tool. Keep the method, then choose tooling that fits your privacy and observability constraints.
Learn the product-builder eval loop: inspect traces, label failures, define criteria, test changes and compare against human judgment.
Have at least one AI workflow where quality can get better or worse over time.
Continue through the same learning path with the next curated companion videos.