LLM Call

What is an LLM call?

An LLM call is a single request sent to a large language model API or service, where you provide input—such as a user query and system prompt—and receive an output in the form of the model's response. These calls are the fundamental building blocks of AI applications.

In complex AI workflows, teams often break down tasks into multiple LLM calls that are orchestrated together, with each call handling a specific subtask before results are aggregated. This approach helps manage complexity and creates more maintainable AI systems.

Why do teams use multiple LLM calls?

Teams divide complex tasks into multiple LLM calls to manage complexity more effectively. When a single prompt becomes pages long and unwieldy—trying to cover multiple dimensions or tasks—splitting it into smaller, focused calls makes the system easier to develop, test, and improve.

For example, an AI coaching tool might use separate LLM calls to evaluate different aspects of an interview: one call for assessing question quality, another for evaluating active listening, and a third for analyzing follow-up techniques. Each call focuses on a specific dimension with a targeted prompt, making each component clearer and more manageable than one massive, multi-purpose prompt.

How do teams orchestrate multiple LLM calls?

Teams use workflow tools and services to orchestrate multiple LLM calls. Options include no-code platforms like Zapier, cloud services like AWS Step Functions, or custom code that manages the sequence of calls and aggregates responses.

Coordinating multiple LLM calls requires explicit management of state and context. Teams need to handle how information flows between calls, aggregate responses into coherent outputs, and manage what data gets passed from one call to the next. This orchestration becomes particularly important when building evals that span multiple calls or when ensuring consistency across related outputs.

Learn more:

Related terms:

← Back to Ai Glossary