WebModel¶
WebModel is the core class of pydantic-ai-web-models. It implements the pydantic-ai Model
interface and routes every inference request through a Temporal workflow to a web-based LLM.
Constructor¶
class WebModel:
def __init__(
self,
provider: str,
model_name: str,
*,
temporal_config: TemporalConfig | None = None,
) -> None: ...
Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
provider |
str |
Yes | Provider identifier. Must be one of "google-web" or "openai-web". |
model_name |
str |
Yes | Model name within the provider. Must be a key in AVAILABLE_MODELS[provider]. |
temporal_config |
TemporalConfig \| None |
No | Temporal connection configuration. If None, the module-level default (see get_default_config) is used. |
Raises¶
ValueError— ifprovideris not a recognised provider string.ValueError— ifmodel_nameis not listed under the givenproviderinAVAILABLE_MODELS.
Properties¶
| Property | Type | Description |
|---|---|---|
model_name |
str |
The full model identifier in "provider:model_name" format, e.g. "google-web:gemini-3-flash". |
system |
str |
The provider string, e.g. "google-web". |
Usage¶
from pydantic_ai import Agent
from pydantic_ai_web_models import WebModel, TemporalConfig
# Construct a WebModel explicitly instead of using a model string
model = WebModel(
provider="google-web",
model_name="gemini-3.1-pro",
temporal_config=TemporalConfig(
task_queue="gpu-workers",
timeout_seconds=900,
),
)
agent = Agent(model=model)
result = agent.run_sync("Summarise the history of the internet.")
print(result.data)
Temporal Client Lifecycle¶
The Temporal client is created lazily on the first request and is cached for the lifetime
of the WebModel instance. Subsequent requests from the same WebModel reuse the same client
without reconnecting.
Thread safety
Client creation is protected by an asyncio.Lock. If multiple coroutines call a WebModel
concurrently before the client has been initialised, only one will perform the connection;
the others will wait and then reuse the client once it is ready. The lock is per-instance,
so different WebModel objects create their clients independently.
Per-run model_settings extensions¶
Pydantic AI merges model_settings from the agent constructor and each run() / run_sync() call and passes the result to WebModel.request(). Besides standard keys (temperature, max_tokens, …), this package recognises:
| Key | Type | Default | Effect |
|---|---|---|---|
thread_id |
str |
(omitted) | If non-empty after stripping, included in the Temporal workflow input so the worker can continue a server-side session. |
skip_system_prompt |
bool |
False |
If exactly True, format_messages() omits system instructions from the prompt sent to Temporal. |
When the workflow completes successfully with a result like your worker’s LLMInvokeResult (response, thread_id, error empty), WebModel sets ModelResponse.metadata to {"thread_id": "<value>"} on the assistant message. Read it with result.response.metadata["thread_id"] (AgentRunResult.response). If error is non-empty, WebModel raises WorkflowExecutionError before any assistant response is produced. The same ModelResponse is also listed in result.new_messages() / all_messages() when you need the full step.
See Conversations: Server-side thread and Architecture: workflow I/O.