v0.3.1 | Structured Extraction

Structured Extraction

Typed output from any media.

Extract structured data from documents, images, audio, and video using LLMs and Pydantic schemas.

                Terminal
                extract.py
              

from openextract import extract

result = extract(
    schema=Invoice,
    model="openai:gpt-5.4",
    input_file="https://example.com/doc.pdf",
    instructions="Extract invoice data",
)
              

Pydantic Any LLM

Zero Config

One function call. Bring a schema, a URL, and an LLM model string.

Multi-Media

Documents, images, audio, and video with smart routing.

Any LLM

OpenAI, Anthropic, Google. One model string, zero config.

Type Safe

Pydantic schemas ensure validated, typed output every time.

How it works

Schema in, typed data out

Define a BaseModel, call extract(), get validated output.

Define a schema

Describe the shape you want with a Pydantic model.

Point at any media

Documents, images, audio, or video via URL.

Get typed output

Validated against your schema. No parsing, no regex.

                Terminal
                extract.py
              

from pydantic import BaseModel
from openextract import extract

class Report(BaseModel):
    title: str
    findings: list[str]
    severity: int

result = extract(
    schema=Report,
    model="openai:gpt-5.2",
    input_file="https://example.com/report.pdf",
    instructions="Extract findings",
)
              

Works with any media

Documents Images Audio Video

PDF, DOCX, PNG, JPG, MP3, MP4, and 20+ formats