Structured Extraction

Deep dive into the structured extraction response schema.

Structured Extraction Schema

When using format: "structured" with the /extract endpoint, PARSEKIT returns a rich JSON structure representing the full document hierarchy.

Full Response Schema

{
  "document": {
    "title": "Q4 Financial Report",
    "sections": [
      {
        "heading": "Executive Summary",
        "content": "This report covers..."
      },
      {
        "heading": "Revenue",
        "content": "Revenue increased by 16%..."
      }
    ]
  },
  "tables": [
    {
      "rows": [
        ["Quarter", "Revenue", "Growth"],
        ["Q1", "$2.4M", "+12%"],
        ["Q2", "$2.8M", "+16%"]
      ]
    }
  ],
  "metadata": {
    "pages": 24,
    "language": "en",
    "file_type": "pdf"
  }
}

Section Object

FieldTypeDescription
headingstringSection heading text
contentstringFull text content of the section

Table Object

FieldTypeDescription
rowsstring[][]Row data as arrays of strings (first row is typically headers)

Metadata Object

FieldTypeDescription
pagesnumberTotal page count
languagestringDetected language (ISO 639-1)
file_typestringDetected file type

Supported File Types

Structured extraction works with all supported extraction file types: PDF, DOCX, HTML, Markdown, TXT, and images (via OCR).