Desired Output

What is desired output?

Desired output is the expected or ideal result that you define for each input in an AI evaluation dataset. When testing an AI system, you compare the actual output generated by the LLM against the desired output to measure how well the system is performing and to grade the quality of its responses.

The pairing of inputs with desired outputs forms the basis for measuring AI quality.

How do teams define desired outputs?

Teams typically define desired outputs in a spreadsheet or evaluation dataset. For each test input—such as a customer question, a classification task, or a transcript—you specify what the correct answer should be. For example, in an image classifier, your desired output might be "that's a horse" for a specific photo.

When you run that input through your AI product, you compare what the system actually produced against what you said it should produce. This comparison gives you a quality score that tells you whether your AI is working as intended.

When is it difficult to define desired outputs?

Some AI use cases have multiple acceptable responses, making it hard to define a single desired output. For instance, with interview coaching or creative writing assistance, there might be many "good enough" outputs rather than one correct answer.

In these cases, simply defining a dataset of inputs and desired outputs may not give you enough confidence that you've covered all the ways the AI could respond appropriately. Teams working on these types of products often need alternative evaluation approaches beyond traditional dataset-based testing.

Learn more:

Related terms:

← Back to Ai Glossary