Comparing Models¶
Because each Agent call is independent and non-blocking, you can run the same prompt against
multiple models concurrently using asyncio.gather. This is useful for benchmarking response
quality, comparing latency, or picking the best model for a given task.
Concurrent Model Comparison¶
import asyncio
from pydantic_ai import Agent
import pydantic_ai_web_models
MODELS = [
"google-web:gemini-3-flash",
"google-web:gemini-3.1-pro",
"openai-web:gpt-5-3",
"openai-web:gpt-5-5",
]
async def ask_model(model: str, prompt: str) -> str:
agent = Agent(model=model)
result = await agent.run(prompt)
return result.data
async def main():
prompt = "In exactly one sentence, what is the meaning of life?"
tasks = [ask_model(m, prompt) for m in MODELS]
responses = await asyncio.gather(*tasks)
for model, response in zip(MODELS, responses):
print(f"[{model}]:")
print(f" {response}")
print()
asyncio.run(main())
All four workflow executions are dispatched concurrently to Temporal. Each one runs in its own Temporal workflow on the worker, so they complete in parallel and the total wall-clock time is roughly equal to the slowest individual model rather than their sum.
Comparing structured output quality
The comparison pattern works equally well with output_type. Pass the same Pydantic model
to each Agent to compare how consistently different models conform to a schema:
import asyncio
from pydantic import BaseModel
from pydantic_ai import Agent
import pydantic_ai_web_models
MODELS = [
"google-web:gemini-3-flash",
"openai-web:gpt-5-5",
]
class MovieReview(BaseModel):
title: str
rating: float # 1.0 – 10.0
pros: list[str]
cons: list[str]
verdict: str
async def ask_model(model: str, prompt: str) -> MovieReview:
agent = Agent(model=model, output_type=MovieReview)
result = await agent.run(prompt)
return result.data
async def main():
prompt = "Review the movie Inception (2010)."
tasks = [ask_model(m, prompt) for m in MODELS]
reviews = await asyncio.gather(*tasks)
for model, review in zip(MODELS, reviews):
print(f"[{model}] — {review.title} ({review.rating}/10)")
print(f" Pros: {', '.join(review.pros)}")
print(f" Cons: {', '.join(review.cons)}")
print()
asyncio.run(main())
This makes it easy to spot which model produces more complete, correctly typed responses for your specific domain.