This guide will walk you through the basics of using Instructor to extract structured data from language models. By the end, you'll understand how to:

Install and set up Instructor
Extract basic structured data
Handle validation and errors
Work with streaming responses
Use different LLM providers

Installation

First, install Instructor:

pip install instructor

To use a specific provider, install the appropriate extras:

# For OpenAI (included by default)
pip install instructor

# For Anthropic
pip install "instructor[anthropic]"

# For other providers
pip install "instructor[google-genai]"         # For Google/Gemini
pip install "instructor[vertexai]"             # For Vertex AI
pip install "instructor[cohere]"               # For Cohere
pip install "instructor[litellm]"              # For LiteLLM (multiple providers)
pip install "instructor[mistralai]"            # For Mistral
pip install "instructor[xai]"                  # For xAI

Setting Up Environment

Set your API keys as environment variables:

# For OpenAI
export OPENAI_API_KEY=your_openai_api_key

# For Anthropic
export ANTHROPIC_API_KEY=your_anthropic_api_key

# For other providers, set relevant API keys

Your First Structured Output

Let's start with a simple example using OpenAI:

import instructor
from pydantic import BaseModel

# Define your output structure
class UserInfo(BaseModel):
    name: str
    age: int

# Create an instructor client with from_provider
client = instructor.from_provider("openai/gpt-5-nano")

# Extract structured data
user_info = client.create(
    response_model=UserInfo,
    messages=[
        {"role": "user", "content": "John Doe is 30 years old."}
    ],
)

print(f"Name: {user_info.name}, Age: {user_info.age}")
# Output: Name: John Doe, Age: 30

This example demonstrates the core workflow:

Define a Pydantic model for your output structure
Create an Instructor client with
```
from_provider
```
Request structured output using the
```
response_model
```
parameter

Validation and Error Handling

Instructor leverages Pydantic's validation to ensure your data meets requirements:

from pydantic import BaseModel, Field, field_validator

class User(BaseModel):
    name: str
    age: int = Field(gt=0, lt=120)  # Age must be between 0 and 120

    @field_validator('name')
    def name_must_have_space(cls, v):
        if ' ' not in v:
            raise ValueError('Name must include first and last name')
        return v

# This will make the LLM retry if validation fails
user = client.create(
    response_model=User,
    messages=[
        {"role": "user", "content": "Extract: Tom is 25 years old."}
    ],
)

Working with Complex Models

Instructor works seamlessly with nested Pydantic models:

from pydantic import BaseModel
from typing import List

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class Person(BaseModel):
    name: str
    age: int
    addresses: List[Address]

person = client.create(
    response_model=Person,
    messages=[
        {"role": "user", "content": """
        Extract: John Smith is 35 years old.
        He has homes at 123 Main St, Springfield, IL 62704 and
        456 Oak Ave, Chicago, IL 60601.
        """}
    ],
)

Streaming Responses

For larger responses or better user experience, use streaming:

from instructor import Partial

# Stream the response as it's being generated
stream = client.create_partial(
    response_model=Person,
    messages=[
        {"role": "user", "content": "Extract a detailed person profile for John Smith, 35, who lives in Chicago and Springfield."}
    ],
)

for partial in stream:
    # This will incrementally show the response being built
    print(partial)

Using Different Providers

Instructor supports multiple LLM providers. Here's how to use Anthropic:

import instructor
from pydantic import BaseModel

class UserInfo(BaseModel):
    name: str
    age: int

# Create an instructor client with from_provider
client = instructor.from_provider("anthropic/claude-3-opus-20240229")

user_info = client.create(
    response_model=UserInfo,
    messages=[
        {"role": "user", "content": "John Doe is 30 years old."}
    ],
)

print(f"Name: {user_info.name}, Age: {user_info.age}")

Frequently Asked Questions

What's the difference between

start-here.md

and

getting-started.md

Start Here: Explains what Instructor is and why you'd use it (conceptual overview)
Getting Started: This guide - shows you how to install and use Instructor (practical steps)

Which provider should I start with?

OpenAI is the most popular choice for beginners due to reliability and wide support. Once comfortable, you can explore Anthropic Claude, Google Gemini, or open-source models.

Do I need to understand Pydantic?

Basic knowledge helps, but you can start with simple models. Instructor works with any Pydantic BaseModel. Learn more advanced features as you need them.

Can I use Instructor with async code?

Yes! Use

async_client=True

when creating your client:

client = instructor.from_provider("openai/gpt-4o", async_client=True)

, then use

await client.create()

What if validation fails?

Instructor automatically retries with validation feedback. You can configure retry behavior with

max_retries

parameter. See retry mechanisms for details.

View all FAQs →

Next Steps

Now that you've mastered the basics, here are some next steps:

Learn about client setup with from_provider for different LLM providers
Explore advanced validation to ensure data quality
Check out the Cookbook examples for real-world applications
See how to use hooks for monitoring and debugging

Using older patterns? If you're using

instructor.patch()

or provider-specific functions like

from_openai()

, check out the Migration Guide to modernize your code.

New to Instructor? Start with Start Here for a conceptual overview.

For more detailed information on any topic, visit the Concepts section.

If you have questions or need help, join our Discord community or check the GitHub repository.

title: Getting Started description: A step-by-step guide to getting started with Instructor for structured outputs from LLMs

Getting Started with Instructor

This guide will walk you through the basics of using Instructor to extract structured data from language models. By the end, you'll understand how to:

Install and set up Instructor
Extract basic structured data
Handle validation and errors
Work with streaming responses
Use different LLM providers

Installation

First, install Instructor:

pip install instructor

To use a specific provider, install the appropriate extras:

# For OpenAI (included by default)
pip install instructor

# For Anthropic
pip install "instructor[anthropic]"

# For other providers
pip install "instructor[google-genai]"         # For Google/Gemini
pip install "instructor[vertexai]"             # For Vertex AI
pip install "instructor[cohere]"               # For Cohere
pip install "instructor[litellm]"              # For LiteLLM (multiple providers)
pip install "instructor[mistralai]"            # For Mistral
pip install "instructor[xai]"                  # For xAI

Setting Up Environment

Set your API keys as environment variables:

# For OpenAI
export OPENAI_API_KEY=your_openai_api_key

# For Anthropic
export ANTHROPIC_API_KEY=your_anthropic_api_key

# For other providers, set relevant API keys

Your First Structured Output

Let's start with a simple example using OpenAI:

import instructor
from pydantic import BaseModel

# Define your output structure
class UserInfo(BaseModel):
    name: str
    age: int

# Create an instructor client with from_provider
client = instructor.from_provider("openai/gpt-5-nano")

# Extract structured data
user_info = client.create(
    response_model=UserInfo,
    messages=[
        {"role": "user", "content": "John Doe is 30 years old."}
    ],
)

print(f"Name: {user_info.name}, Age: {user_info.age}")
# Output: Name: John Doe, Age: 30

This example demonstrates the core workflow:

Define a Pydantic model for your output structure
Create an Instructor client with
```
from_provider
```
Request structured output using the
```
response_model
```
parameter

Validation and Error Handling

Instructor leverages Pydantic's validation to ensure your data meets requirements:

from pydantic import BaseModel, Field, field_validator

class User(BaseModel):
    name: str
    age: int = Field(gt=0, lt=120)  # Age must be between 0 and 120

    @field_validator('name')
    def name_must_have_space(cls, v):
        if ' ' not in v:
            raise ValueError('Name must include first and last name')
        return v

# This will make the LLM retry if validation fails
user = client.create(
    response_model=User,
    messages=[
        {"role": "user", "content": "Extract: Tom is 25 years old."}
    ],
)

Working with Complex Models

Instructor works seamlessly with nested Pydantic models:

from pydantic import BaseModel
from typing import List

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class Person(BaseModel):
    name: str
    age: int
    addresses: List[Address]

person = client.create(
    response_model=Person,
    messages=[
        {"role": "user", "content": """
        Extract: John Smith is 35 years old.
        He has homes at 123 Main St, Springfield, IL 62704 and
        456 Oak Ave, Chicago, IL 60601.
        """}
    ],
)

Streaming Responses

For larger responses or better user experience, use streaming:

from instructor import Partial

# Stream the response as it's being generated
stream = client.create_partial(
    response_model=Person,
    messages=[
        {"role": "user", "content": "Extract a detailed person profile for John Smith, 35, who lives in Chicago and Springfield."}
    ],
)

for partial in stream:
    # This will incrementally show the response being built
    print(partial)

Using Different Providers

Instructor supports multiple LLM providers. Here's how to use Anthropic:

import instructor
from pydantic import BaseModel

class UserInfo(BaseModel):
    name: str
    age: int

# Create an instructor client with from_provider
client = instructor.from_provider("anthropic/claude-3-opus-20240229")

user_info = client.create(
    response_model=UserInfo,
    messages=[
        {"role": "user", "content": "John Doe is 30 years old."}
    ],
)

print(f"Name: {user_info.name}, Age: {user_info.age}")

Frequently Asked Questions

What's the difference between

start-here.md

and

getting-started.md

Start Here: Explains what Instructor is and why you'd use it (conceptual overview)
Getting Started: This guide - shows you how to install and use Instructor (practical steps)

async_client=True

when creating your client:

client = instructor.from_provider("openai/gpt-4o", async_client=True)

, then use

await client.create()

What if validation fails?

Instructor automatically retries with validation feedback. You can configure retry behavior with

max_retries

parameter. See retry mechanisms for details.

View all FAQs →

Next Steps

Now that you've mastered the basics, here are some next steps:

Learn about client setup with from_provider for different LLM providers
Explore advanced validation to ensure data quality
Check out the Cookbook examples for real-world applications
See how to use hooks for monitoring and debugging

Using older patterns? If you're using

instructor.patch()

or provider-specific functions like

from_openai()

, check out the Migration Guide to modernize your code.

New to Instructor? Start with Start Here for a conceptual overview.

For more detailed information on any topic, visit the Concepts section.

If you have questions or need help, join our Discord community or check the GitHub repository.

Getting Started

Additional Files (17)

Related Skills

<h1 align="center">

- Identify gaps

2. Apply Deepthink Protocol (reason about dependencies

Additional Files (17)

title: Getting Started description: A step-by-step guide to getting started with Instructor for structured outputs from LLMs

Getting Started with Instructor

Installation

Setting Up Environment

Your First Structured Output

Validation and Error Handling

Working with Complex Models

Streaming Responses

Using Different Providers

Frequently Asked Questions

What's the difference between
`start-here.md`
and
`getting-started.md`
?

Which provider should I start with?

Do I need to understand Pydantic?

Can I use Instructor with async code?

What if validation fails?

Next Steps

title: Getting Started description: A step-by-step guide to getting started with Instructor for structured outputs from LLMs

Getting Started with Instructor

Installation

Setting Up Environment

Your First Structured Output

Validation and Error Handling

Working with Complex Models

Streaming Responses

Using Different Providers

Frequently Asked Questions

What's the difference between
`start-here.md`
and
`getting-started.md`
?

Which provider should I start with?

Do I need to understand Pydantic?

Can I use Instructor with async code?

What if validation fails?

Next Steps