Error Rate
What is error rate?
Error rate is a metric that measures how often errors occur, typically expressed as a percentage or fraction. In AI products, you measure error rate by comparing your evals' outputs against human judgments—understanding true positives, false positives, and false negatives—or by tracking how many instances in your dataset exhibit a particular failure mode.
A low error rate indicates your system is performing well, while a high error rate signals the need for improvement.
How do teams use error rate to measure improvements?
Error rate gives you a concrete metric to track when making changes to your AI product. You can run an A/B test with your test dataset against both production and your new fix, then measure the error rate for each version.
For example, you might see an error rate drop from 81% of transcripts to 3% after implementing a fix. This dramatic decrease tells you the change made a meaningful impact. Without tracking error rate, you'd be relying on gut feel rather than data to know whether your improvements actually work.
When does a low error rate mean you can stop focusing on that eval?
Once an eval consistently shows a low error rate—like 1 out of 25 traces—and remains aligned with human judgment, you can reduce how much attention you give it. You still run it to monitor whether it stays aligned, but it no longer requires active improvement work.
The key is verifying alignment: Your eval should find the same errors you'd find manually, with minimal false positives or false negatives. When both the error rate is low and the alignment is high, you can confidently shift focus to other areas that need more improvement.
Learn more:
- The Path to Better Product Decisions
- Building My First AI Product: 6 Lessons from My 90-Day Deep Dive
- Q&A Session: Building AI Evals for the Interview Coach
Related terms: