Anthropic. A three-minute Anthropic walkthrough of running a real eval inside the Workbench — auto-generating realistic test cases, grading outputs, tweaking the prompt, and re-running the same suite side-by-side. The view count sits below the usual bar, but for "how do I actually do this without writing code" this is the cleanest official demo and slots neatly under the more strategic Husain/Shankar conversation.
Console UI and feature names can change. Use this as a pattern: fixed test cases, explicit grading criteria, side-by-side comparison and repeatable reruns.
See the smallest no-code version of a repeatable prompt eval.
Basic prompt editing experience and access to an evaluation surface such as a console or workbench.
Continue through the same learning path with the next curated companion videos.