General
PromptBeginner5 minmarkdown
<h1 align="center">
<a href="https://prompts.chat">
5
Learn about the internal architecture and design decisions of the Instructor library
Sign in to like and favorite skills
This page explains the core execution flow and where to plug in or debug. It highlights the minimal sync/async code paths and how streaming, partial, and parallel modes integrate.
sequenceDiagram autonumber participant U as User Code participant I as Instructor (patched) participant R as Retry Layer (tenacity) participant C as Provider Client participant D as Dispatcher (process_response) participant H as Provider Handler (response/reask) participant M as Pydantic Model U->>I: chat.completions.create(response_model=..., **kwargs) Note right of I: patch() wraps create() with cache/templating and retry I->>R: retry_sync/async(func=create, max_retries, strict, mode, hooks) loop attempts R->>C: create(**prepared_kwargs) C-->>R: raw response (provider-specific) R->>D: process_response(_async)(response, response_model, mode, stream) alt Streaming/Partial D->>M: Iterable/Partial.from_streaming_response(_async) D-->>R: Iterable/Partial model (or list of items) else Standard D->>H: provider mode handler (format/parse selection) H-->>D: adjusted response_model/new_kwargs if needed D->>M: response_model.from_response(...) M-->>D: parsed model (with _raw_response attached) D-->>R: model (or adapted simple type) end R-->>I: parsed model end I-->>U: final model (plus _raw_response on instance) rect rgb(255,240,240) Note over R,H: On validation/JSON errors → reask path R->>H: handle_reask_kwargs(..., exception, failed_attempts) H-->>R: new kwargs/messages for next attempt end
Key responsibilities:
create with cache lookup/save, templating, strict mode, hooks, and retry.Mode, handles multimodal message conversion, and attaches _raw_response to the returned model.import openai import instructor from pydantic import BaseModel class User(BaseModel): name: str age: int client = instructor.from_provider("openai/gpt-5-nano") model = client.create( model="gpt-4o-mini", messages=[{"role": "user", "content": "{'name': 'Ada', 'age': 37}"}], response_model=User, # triggers schema/tool wiring + parsing max_retries=3, # tenacity-backed validation retries strict=True, # strict JSON parsing if supported ) # Access raw provider response if needed raw = model._raw_response
import asyncio import openai import instructor from pydantic import BaseModel class User(BaseModel): name: str age: int async def main(): aclient = instructor.from_provider("openai/gpt-5-nano", async_client=True) model = await aclient.create( model="gpt-4o-mini", messages=[{"role": "user", "content": "{\"name\": \"Ada\", \"age\": 37}"}], response_model=User, max_retries=3, strict=True, ) print(model) asyncio.run(main())
create_iterable(response_model=Model, stream=True implicitly) via Instructor.create_iterable.stream=True, and IterableBase.from_streaming_response(_async) assembles items.for item in client.create_iterable(messages=..., response_model=MyModel): print(item)
create_partial(response_model=Model) to receive progressively filled partial models while streaming.Partial[Model] and sets stream=True.for partial in client.create_partial(messages=..., response_model=MyModel): # partial contains fields as they arrive pass
Mode.PARALLEL_TOOLS and a parallel type hint (e.g., list of models) when you need multiple tool calls in one request.from instructor.mode import Mode result = client.create( model="gpt-4o", messages=[{"role": "user", "content": "Extract person and event info."}], response_model=[PersonInfo, EventInfo], mode=Mode.PARALLEL_TOOLS, )
You can observe and instrument the flow with hooks. Typical events:
completion:kwargs: just before provider callcompletion:response: after provider callparse:error: on validation/JSON errorscompletion:last_attempt: when a retry sequence is about to stopcompletion:error: non-validation completion errorsfrom instructor.core.hooks import HookName client.on(HookName.COMPLETION_KWARGS, lambda **kw: print("KWARGS", kw)) client.on(HookName.PARSE_ERROR, lambda e: print("PARSE", e))
processing.multimodal.convert_messages.handle_reask_kwargs) append/adjust messages with error feedback so the next attempt can correct itself.InstructorRetryException is raised containing failed_attempts, the last completion, usage totals, and the create kwargs for reproduction.