Dev Set
What is a dev set?
A dev set (development set) is a smaller collection of traces or examples used during active development and iteration of an AI product. Unlike a test set—which is a larger holdout set used before release—the dev set allows you to quickly experiment with changes to prompts, models, or evals and see if they're improving performance before committing to more expensive testing on the full test set.
The dev set enables faster feedback loops during experimentation while managing costs.
How does a dev set differ from a test set?
A dev set is intentionally smaller than a test set to enable rapid iteration. For example, you might use 25 traces in your dev set but 100 traces in your test set. You run evals against your dev set while you iterate on prompts and evals, looking for expected performance improvements.
Once you think you have a change ready for release, you run it against your test set—the larger holdout set that gives you confidence the improvement will work in production. The test set is more expensive to run because of its size, so you only use it after the dev set shows promise.
What makes dev sets effective?
Both dev sets and test sets must represent what you're seeing in production. If your dev set doesn't match the types of inputs your AI product receives in real use, you can't trust the signals it gives you about whether changes are improvements.
Teams constantly refine both their dev sets and test sets to ensure they match production usage patterns. This representative sampling is what allows small dev sets to provide reliable signals during development.
Learn more:
- How I Designed & Implemented Evals for Product Talk's Interview Coach
- Q&A Session: Building AI Evals for the Interview Coach
- Debugging AI Products: From Data Leakage to Evals with Hamel Husain
Related terms: