Structured Extraction
Deep dive into the structured extraction response schema.
Structured Extraction Schema
When using format: "structured" with the /extract endpoint, PARSEKIT returns a rich JSON structure representing the full document hierarchy.
Full Response Schema
{
"document": {
"title": "Q4 Financial Report",
"sections": [
{
"heading": "Executive Summary",
"content": "This report covers..."
},
{
"heading": "Revenue",
"content": "Revenue increased by 16%..."
}
]
},
"tables": [
{
"rows": [
["Quarter", "Revenue", "Growth"],
["Q1", "$2.4M", "+12%"],
["Q2", "$2.8M", "+16%"]
]
}
],
"metadata": {
"pages": 24,
"language": "en",
"file_type": "pdf"
}
}Section Object
| Field | Type | Description |
|---|---|---|
heading | string | Section heading text |
content | string | Full text content of the section |
Table Object
| Field | Type | Description |
|---|---|---|
rows | string[][] | Row data as arrays of strings (first row is typically headers) |
Metadata Object
| Field | Type | Description |
|---|---|---|
pages | number | Total page count |
language | string | Detected language (ISO 639-1) |
file_type | string | Detected file type |
Supported File Types
Structured extraction works with all supported extraction file types: PDF, DOCX, HTML, Markdown, TXT, and images (via OCR).