Built for developers, AI pipelines, and automation

Parse PDFs, DOCX, HTML and more into clean data for AI pipelines

Convert between formats or extract clean text, chunked output for RAG, and structured JSON with sections and tables — one REST API.

PDFDOCXHTMLMarkdownImagesTXTEncrypted · auto-deleted after 2h

Free tier included · no credit card required

Requestcurl
curl -X POST https://api.parsekit.dev/extract \
  -H "Authorization: Bearer API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file_url": "https://example.com/contract.pdf",
    "format": "structured"
  }'
Responsecomplete
{
  "status": "complete",
  "title": "Service Agreement",
  "sections": [
    { "heading": "Parties", "text": "Acme Corp ("Client")..." },
    { "heading": "Payment Terms", "text": "Net 30..." }
  ],
  "tables": [
    { "headers": ["Service", "Rate"], "rows": [["Engineering", "$200/hr"]] }
  ]
}

How it works

Authorize. Submit. Poll.

01

Get a bearer token

POST your API key to /authorize. You get back a bearer token that's valid for 1 hour.

02

Submit a job

Call /convert or /extract with a file URL or uploaded file_id. You get back a job_id immediately.

03

Poll for the result

GET /job/:id until status is complete. Download the output from the returned URL.

/convert

Convert between 15 format pairs

Pass the source format, target format, and a file. The API handles the rest — PDF rendering, DOCX parsing, HTML generation, OCR, all server-side.

Terminal
# Convert a DOCX file to PDF
curl -X POST https://api.parsekit.dev/convert \
  -H "Authorization: Bearer eyJhbG..." \
  -H "Content-Type: application/json" \
  -d '{
    "file_url": "https://example.com/report.docx",
    "from": "docx",
    "to": "pdf"
  }'

# → {"job_id": "clx9g7...", "status": "queued"}
# Poll GET /job/clx9g7... → status: "complete"
# Download the PDF from output_url

/upload

Upload files directly

Don't have a public URL? Upload via multipart form data. You get a file_id you can reuse across multiple conversions and extractions. Files are encrypted and automatically deleted after 2 hours.

Terminal
# Upload a file, then use the file_id for conversions
curl -X POST https://api.parsekit.dev/upload \
  -H "Authorization: Bearer eyJhbG..." \
  -F "file=@invoice.pdf"

# Response:
# {
#   "file_id": "clx9h3...",
#   "file_url": "https://api.parsekit.dev/files/...",
#   "filename": "invoice.pdf",
#   "size_bytes": 284102,
#   "mime_type": "application/pdf",
#   "expires_at": "2026-03-05T00:14:00.000Z"
# }

# Now use the file_id instead of a URL:
curl -X POST https://api.parsekit.dev/extract \
  -H "Authorization: Bearer eyJhbG..." \
  -H "Content-Type: application/json" \
  -d '{"file_id": "clx9h3...", "format": "text"}'

Use cases

Built for developers, AI pipelines, and automation

File format conversion

Convert DOCX to PDF, HTML to Markdown, PDF to images, and 12 more format pairs. One endpoint, no CLI tools or LibreOffice installs.

RAG ingestion

Extract chunked text from PDFs and DOCX files. Feed directly into OpenAI embeddings, Pinecone, or any vector store.

LLM tool calling

Give your agent the ability to read any document. Pass a URL, get structured JSON back in a single tool call.

Invoice & receipt parsing

Extract tables, line items, and totals from scanned or digital PDFs. OCR runs automatically for scanned documents.

Document migration

Batch convert legacy DOCX and HTML files into Markdown, PDF, or plain text for a new CMS or knowledge base.

PDF to image rendering

Render PDF pages as PNG images. Get a single image or a ZIP of all pages for previews and thumbnails.

Supported formats

All processing runs server-side with open-source tools. Your files never leave the pipeline.

Accepted uploads

.pdf.docx.doc.html.htm.md.txt.png.jpg.jpeg.gif.webp.tiff

Max 25 MB per file. Encrypted at rest. Auto-deleted after 2 hours.

Conversions

pdftext, images, html
docxpdf, text, html
htmlpdf, markdown, text
markdownhtml, pdf
imagetext (OCR), pdf
txtpdf, markdown

Extractions

Any uploaded file type can be extracted. Choose an output format:

textClean plain text output
chunksSplit into chunks for embeddings
structuredJSON with headings, sections, tables

OCR runs automatically for scanned PDFs and images.

Security

Your documents are never retained

Uploaded files and conversion outputs are encrypted at rest and automatically purged after 2 hours. We don't train on your data.

AES-256

Server-side encryption for all stored files

2h auto-delete

Uploads and outputs purged automatically

Signed URLs

Time-limited access, no permanent links

SSRF blocked

Private/internal network access prevented

Pricing

Free to start. Pay as you scale.

Uploads, conversions, and extractions share one monthly limit. No per-page fees.

Free
$0/mo

For testing and side projects

  • 50 operations / month
  • Includes uploads, conversions & extractions
  • 2 API keys
  • 3 concurrent jobs
  • 25 MB per file
Get Started
StarterPOPULAR
$29/mo

For startups shipping to production

  • 5,000 operations / month
  • Includes uploads, conversions & extractions
  • 10 API keys
  • 20 concurrent jobs
  • 25 MB per file
Get Started
Pro
$79/mo

For teams processing at scale

  • 25,000 operations / month
  • Includes uploads, conversions & extractions
  • 25 API keys
  • 50 concurrent jobs
  • 25 MB per file
Get Started

No credit card required for Free. Bearer tokens expire after 1 hour.

Rate limits

Built-in protection, no config needed

Every endpoint is rate-limited out of the box. Plan-based caps keep usage predictable.

Request rate limits

POST /authorize10 req/min
/convert, /extract, /upload60 req/min
GET /job/:id60 req/min

Per bearer token or IP. Resets every 60 seconds.

Plan-based limits

Free50 ops/mo, 3 concurrent
Starter5,000 ops/mo, 20 concurrent
Pro25,000 ops/mo, 50 concurrent

Monthly usage resets on the 1st. Full details →

Start converting and extracting in minutes

Sign in with Google or GitHub, grab an API key from the dashboard, and make your first conversion or extraction. No SDK needed — it's just REST.