JSON Schema Guide
This guide shows you how to write JSON schemas that generate structured outputs reliably. Follow these guidelines to ensure your schemas work seamlessly.
Core Principles
Always use"type": "object"
at the root level. Wrap arrays or other types in an object property to ensure compatibility across all LLM providers. The root schema must always be an object.
Always includenull
in your type arrays. This prevents errors when the LLM cannot find the information in the source document. Without null
, the LLM may fail to generate any response at all if it cannot determine a field's value. In the JSON schema, null
should always be used in quotes: "null"
{
"customer_name": {"type": ["string", "null"]},
"email": {"type": ["string", "null"]},
"discount": {"type": ["number", "null"]},
"payment_method": {"type": ["string", "null"]}
}
Always setadditionalProperties
to false
for objects. This ensures the LLM only returns the fields you explicitly define in your schema and prevents unexpected extra properties.
{
"type": "object",
"properties": {
"name": {"type": ["string", "null"]}
},
"additionalProperties": false
}
Keep schemas simple and focused. Avoid deep nesting (maximum 5 levels) and complex validation rules. Simple schemas produce more reliable outputs.
Supported JSON Schema Features
Basic Data Types
{
"type": "object",
"properties": {
"text_field": {"type": ["string", "null"]},
"number_field": {"type": ["number", "null"]},
"integer_field": {"type": ["integer", "null"]},
"boolean_field": {"type": ["boolean", "null"]},
"array_field": {
"type": ["array", "null"],
"items": {"type": ["string", "null"]}
}
},
"additionalProperties": false
}
Multiple Types and Nullable Fields
When a field can accept multiple types, use an array of types and always include null
:
{
"flexible_field": {"type": ["string", "number", "null"]},
"maybe_array": {"type": ["array", "null"], "items": {"type": ["string", "null"]}}
}
Enums for Fixed Values
Constrain field values to specific options and always include null
:
{
"status": {
"type": ["string", "null"],
"enum": ["pending", "approved", "rejected", "null"]
},
"priority": {
"type": ["string", "null"],
"enum": ["low", "medium", "high", "null"]
}
}
Arrays
Define arrays for extracting lists of items from documents:
{
"invoice_items": {
"type": ["array", "null"],
"items": {
"type": "object",
"properties": {
"article_code": {"type": ["string", "null"]},
"description": {"type": ["string", "null"]},
"price": {"type": ["number", "null"]}
}
}
}
}
Internal References
Use $ref
for reusing schema definitions within the same document:
{
"$defs": {
"Address": {
"type": "object",
"properties": {
"street": {"type": ["string", "null"]},
"city": {"type": ["string", "null"]},
"country": {"type": ["string", "null"]}
},
"additionalProperties": false
}
},
"type": "object",
"properties": {
"billing_address": {"$ref": "#/$defs/Address"},
"shipping_address": {"$ref": "#/$defs/Address"}
},
"additionalProperties": false
}
Invoice Extraction Example
Let's extract structured data from an invoice document. We want to capture the invoice header information (number, date, customer details), a list of line items with their details, and calculate totals. Some fields might be missing or unclear in the source document, so we'll handle those as nullable.
JSON Schema:
{
"$defs": {
"address": {
"type": "object",
"properties": {
"street": {"type": ["string", "null"]},
"city": {"type": ["string", "null"]},
"postal_code": {"type": ["string", "null"]},
"country": {"type": ["string", "null"]}
},
"additionalProperties": false
},
"line_item": {
"type": "object",
"properties": {
"article_code": {"type": ["string", "null"]},
"description": {"type": ["string", "null"]},
"quantity": {"type": ["number", "null"]},
"unit_price": {"type": ["number", "null"]},
"total_price": {"type": ["number", "null"]},
"tax_rate": {"type": ["number", "null"]}
},
"additionalProperties": false
}
},
"type": "object",
"properties": {
"invoice_number": {"type": ["string", "null"]},
"invoice_date": {"type": ["string", "null"]},
"due_date": {"type": ["string", "null"]},
"customer": {
"type": ["object", "null"],
"properties": {
"name": {"type": ["string", "null"]},
"company": {"type": ["string", "null"]},
"address": {"$ref": "#/$defs/address"},
"tax_id": {"type": ["string", "null"]}
},
"additionalProperties": false
},
"items": {
"type": ["array", "null"],
"items": {"$ref": "#/$defs/line_item"}
},
"currency": {"type": ["string", "null"]},
"subtotal": {"type": ["number", "null"]},
"tax_amount": {"type": ["number", "null"]},
"total_amount": {"type": ["number", "null"]},
"payment_status": {
"type": ["string", "null"],
"enum": ["paid", "pending", "overdue", "cancelled", "null"]
},
"notes": {"type": ["string", "null"]}
},
"additionalProperties": false
}
Example of expected JSON Output:
{
"invoice_number": "INV-2024-001",
"invoice_date": "2024-03-15",
"due_date": "2024-04-15",
"customer": {
"name": "John Smith",
"company": "Smith Construction Ltd",
"address": {
"street": "123 Main Street",
"city": "New York",
"postal_code": "10001",
"country": "USA"
},
"tax_id": "US123456789"
},
"items": [
{
"article_code": "CONC-001",
"description": "Concrete blocks 20x20cm",
"quantity": 50,
"unit_price": 12.50,
"total_price": 625.00,
"tax_rate": 0.08
},
{
"article_code": null,
"description": "Delivery service",
"quantity": 1,
"unit_price": 75.00,
"total_price": 75.00,
"tax_rate": null
}
],
"currency": "USD",
"subtotal": 700.00,
"tax_amount": 50.00,
"total_amount": 750.00,
"payment_status": "pending",
"notes": "Net 30 payment terms"
}
Unsupported Features
Avoid these JSON Schema features as they may cause compatibility issues:
nullable
- Use["type", "null"]
arrays insteadanyOf
,oneOf
,allOf
- Use type arrays for multiple typesformat
constraints (email
,date-time
, etc.) - Use plain strings- String validation (
minLength
,maxLength
,pattern
) - Numeric constraints (
minimum
,maximum
) default
values$schema
declarations- External
$ref
references
Best Practices
Design for clarity: Use descriptive property names and include description
fields when the purpose isn't obvious.
Test with simple prompts: Verify your schema works by testing with straightforward requests before using complex prompts.
Keep nesting shallow: Limit object nesting to 4-5 levels maximum for better reliability.
Use enums liberally: When you know the possible values, constrain them with enums rather than relying on the model to choose appropriate values.
Handle null values explicitly: Always include null
in your type arrays to prevent extraction failures when information is missing from the source document.
This approach ensures your JSON schemas generate consistent, structured outputs regardless of which LLM provider processes your requests.
Updated 10 days ago