Supported Conversions
Complete table of all supported conversion pairs and file types.
Supported File Types
Upload & Input Formats
| Category | Extensions | Use With |
|---|---|---|
.pdf | Convert, Extract | |
| Word | .docx, .doc | Convert, Extract |
| HTML | .html, .htm | Convert, Extract |
| Markdown | .md | Convert, Extract |
| Plain Text | .txt | Convert, Extract |
| Images | .png, .jpg, .jpeg, .gif, .webp, .tiff | Convert (OCR), Extract (OCR) |
Max file size: 25 MB for all file types. Files are encrypted at rest and auto-deleted after 2 hours.
Image Format Details
| Format | Extension | Best For |
|---|---|---|
| PNG | .png | Screenshots, diagrams, text-heavy images |
| JPEG | .jpg, .jpeg | Photographs, scanned documents |
| GIF | .gif | Simple graphics (first frame used) |
| WebP | .webp | Modern web images |
| TIFF | .tiff | High-quality scans, multi-page documents |
Conversion Matrix (`/convert`)
| From \ To | text | html | images | markdown | |
|---|---|---|---|---|---|
| ✓ | ✓ | ✓ | — | — | |
| docx | ✓ | ✓ | — | — | ✓ |
| html | ✓ | — | — | ✓ | ✓ |
| markdown | — | ✓ | — | — | ✓ |
| image | ✓ (OCR) | — | — | — | ✓ |
| txt | — | — | — | ✓ | ✓ |
15 conversion pairs total.
Extraction Matrix (`/extract`)
All file types support all three extraction formats:
| File Type | text | chunks | structured | Notes |
|---|---|---|---|---|
| ✓ | ✓ | ✓ | OCR fallback for scanned documents | |
| DOCX | ✓ | ✓ | ✓ | Full text extraction |
| HTML | ✓ | ✓ | ✓ | Tag stripping |
| Markdown | ✓ | ✓ | ✓ | Markdown to plain text |
| TXT | ✓ | ✓ | ✓ | Pass-through |
| Images | ✓ | ✓ | ✓ | OCR via Tesseract — works best with high-contrast printed text |
Notes
- •PDF → Images: Renders each page as a PNG at 150 DPI. Single-page PDFs return a
.pngfile. Multi-page PDFs return a.ziparchive containingpage-001.png,page-002.png, etc. - •Image → Text: Uses Tesseract OCR. Works best with high-contrast, printed text. Supports PNG, JPEG, GIF, WebP, and TIFF.
- •PDF OCR Fallback: For scanned PDFs without a text layer, PARSEKIT automatically renders pages to images and runs OCR to extract text.
- •File size limit: 25 MB for all file types.