OX
OpenXtract
v0.1.4 Now Available

Turn documents into structured data.

An open-source framework to extract clean, typed data from documents, images, audio, and video with minimal setup.

Star on GitHub
$ uv add open-xtract
Copied!
main.py
from pydantic import BaseModel
from open_xtract import extract

class Invoice(BaseModel):
    invoice_number: str
    date: str
    total: float

result = extract(
    schema=Invoice,
    model="anthropic:claude-sonnet-4-5",
    url="https://example.com/invoice.pdf",
    instructions="Extract invoice details",
)
# Returns typed Invoice instance

Multi-Media Support

Extract from documents, images, audio, and video. Smart detection routes to the right handler automatically.

Any LLM Provider

Built on pydantic-ai. Use OpenAI, Anthropic, Google, or any compatible provider with a single model string.

Type Safe

Leverage Pydantic schemas to ensure extracted data is validated, typed, and clean every time.

Built for developers.

Stop writing regex. Just define your schema and let the LLM do the heavy lifting.

  • Works with documents, images, audio, and video
  • Structured error handling with typed exceptions
  • Optional logfire instrumentation for tracing
receipt_parser.py
from open_xtract import extract, UrlFetchError, ModelError

class Receipt(BaseModel):
    vendor: str
    items: list[LineItem]
    total: float

try:
    result = extract(
        schema=Receipt,
        model="anthropic:claude-sonnet-4-5",
        url="https://example.com/receipt.jpg",
        instructions="Extract receipt details",
    )
    print(f"Vendor: {result.vendor}, Total: ${result.total}")
except UrlFetchError as e:
    print(f"Failed to fetch: {e}")