Supported Conversions

Complete table of all supported conversion pairs and file types.

Supported File Types

Max file size: 25 MB for all file types. Files are encrypted at rest and auto-deleted after 2 hours.

Format	Extension	Best For
PNG	`.png`	Screenshots, diagrams, text-heavy images
JPEG	`.jpg`, `.jpeg`	Photographs, scanned documents
GIF	`.gif`	Simple graphics (first frame used)
WebP	`.webp`	Modern web images
TIFF	`.tiff`	High-quality scans, multi-page documents

From \ To	text	html	images	markdown	pdf
pdf	✓	✓	✓	—	—
docx	✓	✓	—	—	✓
html	✓	—	—	✓	✓
markdown	—	✓	—	—	✓
image	✓ (OCR)	—	—	—	✓
txt	—	—	—	✓	✓

15 conversion pairs total.

All file types support all three extraction formats:

File Type	`text`	`chunks`	`structured`	Notes
PDF	✓	✓	✓	OCR fallback for scanned documents
DOCX	✓	✓	✓	Full text extraction
HTML	✓	✓	✓	Tag stripping
Markdown	✓	✓	✓	Markdown to plain text
TXT	✓	✓	✓	Pass-through
Images	✓	✓	✓	OCR via Tesseract — works best with high-contrast printed text

•PDF → Images: Renders each page as a PNG at 150 DPI. Single-page PDFs return a .png file. Multi-page PDFs return a .zip archive containing page-001.png, page-002.png, etc.
•Image → Text: Uses Tesseract OCR. Works best with high-contrast, printed text. Supports PNG, JPEG, GIF, WebP, and TIFF.
•PDF OCR Fallback: For scanned PDFs without a text layer, PARSEKIT automatically renders pages to images and runs OCR to extract text.
•File size limit: 25 MB for all file types.