Supported Conversions

Complete table of all supported conversion pairs and file types.

Supported File Types

Upload & Input Formats

CategoryExtensionsUse With
PDF.pdfConvert, Extract
Word.docx, .docConvert, Extract
HTML.html, .htmConvert, Extract
Markdown.mdConvert, Extract
Plain Text.txtConvert, Extract
Images.png, .jpg, .jpeg, .gif, .webp, .tiffConvert (OCR), Extract (OCR)

Max file size: 25 MB for all file types. Files are encrypted at rest and auto-deleted after 2 hours.

Image Format Details

FormatExtensionBest For
PNG.pngScreenshots, diagrams, text-heavy images
JPEG.jpg, .jpegPhotographs, scanned documents
GIF.gifSimple graphics (first frame used)
WebP.webpModern web images
TIFF.tiffHigh-quality scans, multi-page documents

Conversion Matrix (`/convert`)

From \ Totexthtmlimagesmarkdownpdf
pdf
docx
html
markdown
image✓ (OCR)
txt

15 conversion pairs total.

Extraction Matrix (`/extract`)

All file types support all three extraction formats:

File TypetextchunksstructuredNotes
PDFOCR fallback for scanned documents
DOCXFull text extraction
HTMLTag stripping
MarkdownMarkdown to plain text
TXTPass-through
ImagesOCR via Tesseract — works best with high-contrast printed text

Notes

  • PDF → Images: Renders each page as a PNG at 150 DPI. Single-page PDFs return a .png file. Multi-page PDFs return a .zip archive containing page-001.png, page-002.png, etc.
  • Image → Text: Uses Tesseract OCR. Works best with high-contrast, printed text. Supports PNG, JPEG, GIF, WebP, and TIFF.
  • PDF OCR Fallback: For scanned PDFs without a text layer, PARSEKIT automatically renders pages to images and runs OCR to extract text.
  • File size limit: 25 MB for all file types.