PDF, PPTX, and DOCX in one API call. Scanned pages OCR'd inline. 5–7.5× faster than every hosted provider we tested.
{
"chunks": [Faster than every hosted provider. On every document, across every format we tested.
Median of three runs across twelve documents ranging from three-page memos to a 492-page technical spec.
Pass a URL when the document is already hosted. Upload the bytes when it's sitting in memory. Same response shape either way.
$ curl https://api.extract.page/v1/extract \
-H "X-API-KEY: $EXTRACT_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://cdn.extract.page/demo/overview-of-computer-science.pdf"}'
{
"chunks": [
{ "page_content": "Attention Is All You Need", "page_no": 1, "bbox": [90.0, 94.0, 505.2, 118.4] },
{ "page_content": "Ashish Vaswani", "page_no": 1, "bbox": [108.0, 132.0, 198.3, 143.1] },
{ "page_content": "Noam Shazeer", "page_no": 1, "bbox": [210.0, 132.0, 292.1, 143.1] }
]
}PDF, PPTX, and DOCX. Scanned PDFs are handled automatically. OCR runs inline with no separate surcharge.
A list of chunks. Each chunk carries page_content, page_no, and a bbox (x0, y0, x1, y1 in PDF points). OCR'd spans also get a confidence score.
Source documents are processed in memory and dropped as soon as the response returns. We don't retain the original file and don't train on your data. Extracted images are uploaded to our object store so you can fetch them via the image_url returned in the response. The custom tier supports customer-managed encryption, configurable retention, and dedicated regions.
Textract is $1.50 per 1,000 pages on paper, but charges separately for forms, tables, and queries, and requires AWS enterprise agreements for volume. Reducto is $30 per 1,000 at their listed tier. Our $3 is all-in: synchronous API, every format, no seat fees, no minimums. For most teams doing under 1M pages/month, we're cheaper in total spend.
500 pages and 150 MB per synchronous request. Anything larger needs the async endpoint; email hello@extract.page for access.
Grab a key from the dashboard after signup, send it as X-API-KEY on every request. No OAuth, no per-environment gymnastics.
1,000 pages on signup. No card, no expiry. Paid usage is $3 per 1,000 pages after that, with no seat fees and no monthly minimum.
Top up from the dashboard in $10, $30, $100, $500 increments, or any custom amount. Keys keep working the moment a top-up lands.
Yes, on the custom tier. Negotiated rate per page, dedicated region, private networking, production SLAs, and a Slack channel with the engineering team.
1,000 free pages. No card. Pay $3 per 1,000 after.