Structured Extraction
Zero Config
One function call. Bring a schema, a URL, and an LLM model string.
Multi-Media
Documents, images, audio, and video with smart routing.
Any LLM
OpenAI, Anthropic, Google. One model string, zero config.
Type Safe
Pydantic schemas ensure validated, typed output every time.
How it works
Schema in, typed data out
Define a BaseModel, call extract(), get validated output.
1
Define a schema
Describe the shape you want with a Pydantic model.
2
Point at any media
Documents, images, audio, or video via URL.
3
Get typed output
Validated against your schema. No parsing, no regex.
Terminal
extract.py
from pydantic import BaseModel
from openextract import extract
class Report(BaseModel):
title: str
findings: list[str]
severity: int
result = extract(
schema=Report,
model="openai:gpt-5.2",
input_file="https://example.com/report.pdf",
instructions="Extract findings",
)
Works with any media
Documents
Images
Audio
Video
PDF, DOCX, PNG, JPG, MP3, MP4, and 20+ formats