Built for developers, AI pipelines, and automation
Parse PDFs, DOCX, HTML and more into clean data for AI pipelines
Convert between formats or extract clean text, chunked output for RAG, and structured JSON with sections and tables — one REST API.
Free tier included · no credit card required
curl -X POST https://api.parsekit.dev/extract \
-H "Authorization: Bearer API_KEY" \
-H "Content-Type: application/json" \
-d '{
"file_url": "https://example.com/contract.pdf",
"format": "structured"
}'{
"status": "complete",
"title": "Service Agreement",
"sections": [
{ "heading": "Parties", "text": "Acme Corp ("Client")..." },
{ "heading": "Payment Terms", "text": "Net 30..." }
],
"tables": [
{ "headers": ["Service", "Rate"], "rows": [["Engineering", "$200/hr"]] }
]
}How it works
Authorize. Submit. Poll.
Get a bearer token
POST your API key to /authorize. You get back a bearer token that's valid for 1 hour.
Submit a job
Call /convert or /extract with a file URL or uploaded file_id. You get back a job_id immediately.
Poll for the result
GET /job/:id until status is complete. Download the output from the returned URL.
/convert
Convert between 15 format pairs
Pass the source format, target format, and a file. The API handles the rest — PDF rendering, DOCX parsing, HTML generation, OCR, all server-side.
# Convert a DOCX file to PDF
curl -X POST https://api.parsekit.dev/convert \
-H "Authorization: Bearer eyJhbG..." \
-H "Content-Type: application/json" \
-d '{
"file_url": "https://example.com/report.docx",
"from": "docx",
"to": "pdf"
}'
# → {"job_id": "clx9g7...", "status": "queued"}
# Poll GET /job/clx9g7... → status: "complete"
# Download the PDF from output_url/upload
Upload files directly
Don't have a public URL? Upload via multipart form data. You get a file_id you can reuse across multiple conversions and extractions. Files are encrypted and automatically deleted after 2 hours.
# Upload a file, then use the file_id for conversions
curl -X POST https://api.parsekit.dev/upload \
-H "Authorization: Bearer eyJhbG..." \
-F "file=@invoice.pdf"
# Response:
# {
# "file_id": "clx9h3...",
# "file_url": "https://api.parsekit.dev/files/...",
# "filename": "invoice.pdf",
# "size_bytes": 284102,
# "mime_type": "application/pdf",
# "expires_at": "2026-03-05T00:14:00.000Z"
# }
# Now use the file_id instead of a URL:
curl -X POST https://api.parsekit.dev/extract \
-H "Authorization: Bearer eyJhbG..." \
-H "Content-Type: application/json" \
-d '{"file_id": "clx9h3...", "format": "text"}'Use cases
Built for developers, AI pipelines, and automation
File format conversion
Convert DOCX to PDF, HTML to Markdown, PDF to images, and 12 more format pairs. One endpoint, no CLI tools or LibreOffice installs.
RAG ingestion
Extract chunked text from PDFs and DOCX files. Feed directly into OpenAI embeddings, Pinecone, or any vector store.
LLM tool calling
Give your agent the ability to read any document. Pass a URL, get structured JSON back in a single tool call.
Invoice & receipt parsing
Extract tables, line items, and totals from scanned or digital PDFs. OCR runs automatically for scanned documents.
Document migration
Batch convert legacy DOCX and HTML files into Markdown, PDF, or plain text for a new CMS or knowledge base.
PDF to image rendering
Render PDF pages as PNG images. Get a single image or a ZIP of all pages for previews and thumbnails.
Supported formats
All processing runs server-side with open-source tools. Your files never leave the pipeline.
Accepted uploads
Max 25 MB per file. Encrypted at rest. Auto-deleted after 2 hours.
Conversions
Extractions
Any uploaded file type can be extracted. Choose an output format:
OCR runs automatically for scanned PDFs and images.
Security
Your documents are never retained
Uploaded files and conversion outputs are encrypted at rest and automatically purged after 2 hours. We don't train on your data.
Server-side encryption for all stored files
Uploads and outputs purged automatically
Time-limited access, no permanent links
Private/internal network access prevented
Pricing
Free to start. Pay as you scale.
Uploads, conversions, and extractions share one monthly limit. No per-page fees.
For testing and side projects
- 50 operations / month
- Includes uploads, conversions & extractions
- 2 API keys
- 3 concurrent jobs
- 25 MB per file
For startups shipping to production
- 5,000 operations / month
- Includes uploads, conversions & extractions
- 10 API keys
- 20 concurrent jobs
- 25 MB per file
For teams processing at scale
- 25,000 operations / month
- Includes uploads, conversions & extractions
- 25 API keys
- 50 concurrent jobs
- 25 MB per file
No credit card required for Free. Bearer tokens expire after 1 hour.
Rate limits
Built-in protection, no config needed
Every endpoint is rate-limited out of the box. Plan-based caps keep usage predictable.
Request rate limits
Per bearer token or IP. Resets every 60 seconds.
Plan-based limits
Monthly usage resets on the 1st. Full details →
Start converting and extracting in minutes
Sign in with Google or GitHub, grab an API key from the dashboard, and make your first conversion or extraction. No SDK needed — it's just REST.