Dataset

What is a dataset?

A dataset is a collection of examples (inputs and expected outputs) used to train, test, or evaluate AI systems. In the context of AI evals, a dataset typically consists of sample inputs paired with their desired outputs, allowing you to systematically test whether your AI product produces the expected results across various scenarios.

You run your evals by sending those inputs to the LLM, seeing what the output is, and comparing it to your expected output.

How do teams use datasets in AI product development?

Teams create datasets by collecting examples—such as customer interview transcripts, user inputs, or other real-world scenarios—and pairing each example with the desired LLM response. This collection becomes your benchmark for testing.

When you want to test a change to your AI product, you run your dataset through both the current version and the new version, then compare the results. For instance, you might see an error rate drop from 81% to 3% when testing a fix against your dataset. This makes datasets essential for systematic improvement of AI products.

What makes a good dataset for AI evaluation?

An effective dataset includes specific inputs with clearly defined expected outputs. The inputs should represent the range of scenarios your AI product will encounter in production. The expected outputs need to be precise enough that you can objectively grade how closely the LLM's actual output matches what you wanted.

Datasets serve as benchmarks for comparing AI performance across different versions or configurations, enabling teams to make data-driven decisions about which changes actually improve their product.

Learn more:

Related terms:

← Back to Ai Glossary