The fastest hosted
extraction API

PDF, PPTX, and DOCX in one API call. Scanned pages OCR'd inline. 5–7.5× faster than every hosted provider we tested.

POST /v1/extract· 1706.03762.pdf · 15 pages200 · 0ms
chunks0 / 0
{
  "chunks": [
elapsed 0ms
benchmarks

Faster than every hosted provider. On every document, across every format we tested.

total across 12 docs
159sextract
slowest
1192sllamaparse · 7.5× us
tested onborn-digital papersfinancial reportsscanned formsimage-heavy decksmulti-column layoutslarge technical specsadversarial layouts
extract
159s
reducto
806s
5.1×
aws bda
844s
5.3×
azure di
895s
5.6×
llamaparse
1192s
7.5×

Median of three runs across twelve documents ranging from three-page memos to a 492-page technical spec.

usage

Two ways in.

Pass a URL when the document is already hosted. Upload the bytes when it's sitting in memory. Same response shape either way.

$ curl https://api.extract.page/v1/extract \
    -H "X-API-KEY: $EXTRACT_KEY" \
    -H "Content-Type: application/json" \
    -d '{"url": "https://cdn.extract.page/demo/overview-of-computer-science.pdf"}'

{
  "chunks": [
    { "page_content": "Attention Is All You Need", "page_no": 1, "bbox": [90.0, 94.0, 505.2, 118.4] },
    { "page_content": "Ashish Vaswani",            "page_no": 1, "bbox": [108.0, 132.0, 198.3, 143.1] },
    { "page_content": "Noam Shazeer",              "page_no": 1, "bbox": [210.0, 132.0, 292.1, 143.1] }
  ]
}
pricing

Start for free.
Usage-based from there.

free
$0
1,000 pages · no card · lifetime
  • Full API access, no rate gates
  • Self-serve dashboard with usage
  • Email support
  • $3 / 1,000 pages after free credit
custom
Enterprise
for teams with higher workloads · volume discounts · SLAs
  • Dedicated region + private networking
  • Slack channel with engineering
  • Production SLAs and priority queues
faq

Questions
before you ship.

If something isn't here, email hello@extract.page.

start

Ship extraction in an afternoon.

1,000 free pages. No card. Pay $3 per 1,000 after.