Structured Extraction

Deep dive into the structured extraction response schema.

Structured Extraction Schema

When using format: "structured" with the /extract endpoint, PARSEKIT returns a rich JSON structure representing the full document hierarchy.

Full Response Schema

{
  "document": {
    "title": "Q4 Financial Report",
    "sections": [
      {
        "heading": "Executive Summary",
        "content": "This report covers..."
      },
      {
        "heading": "Revenue",
        "content": "Revenue increased by 16%..."
      }
    ]
  },
  "tables": [
    {
      "rows": [
        ["Quarter", "Revenue", "Growth"],
        ["Q1", "$2.4M", "+12%"],
        ["Q2", "$2.8M", "+16%"]
      ]
    }
  ],
  "metadata": {
    "pages": 24,
    "language": "en",
    "file_type": "pdf"
  }
}

Section Object

Field	Type	Description
`heading`	string	Section heading text
`content`	string	Full text content of the section

Table Object

Field	Type	Description
`rows`	string[][]	Row data as arrays of strings (first row is typically headers)

Metadata Object

Field	Type	Description
`pages`	number	Total page count
`language`	string	Detected language (ISO 639-1)
`file_type`	string	Detected file type

Supported File Types

Structured extraction works with all supported extraction file types: PDF, DOCX, HTML, Markdown, TXT, and images (via OCR).

← Extract Supported Conversions →