JSON Schema Guide

This guide shows you how to write JSON schemas that generate structured outputs reliably. Follow these guidelines to ensure your schemas work seamlessly.

Core Principles

Always use"type": "object" at the root level. Wrap arrays or other types in an object property to ensure compatibility across all LLM providers. The root schema must always be an object.

Always includenull in your type arrays. This prevents errors when the LLM cannot find the information in the source document. Without null, the LLM may fail to generate any response at all if it cannot determine a field's value. In the JSON schema, null should always be used in quotes: "null"

{
  "customer_name": {"type": ["string", "null"]},
  "email": {"type": ["string", "null"]},
  "discount": {"type": ["number", "null"]},
  "payment_method": {"type": ["string", "null"]}
}

Always setadditionalProperties to false for objects. This ensures the LLM only returns the fields you explicitly define in your schema and prevents unexpected extra properties.

{
  "type": "object",
  "properties": {
    "name": {"type": ["string", "null"]}
  },
  "additionalProperties": false
}

Keep schemas simple and focused. Avoid deep nesting (maximum 5 levels) and complex validation rules. Simple schemas produce more reliable outputs.

Supported JSON Schema Features

Basic Data Types

{
  "type": "object",
  "properties": {
    "text_field": {"type": ["string", "null"]},
    "number_field": {"type": ["number", "null"]},
    "integer_field": {"type": ["integer", "null"]},
    "boolean_field": {"type": ["boolean", "null"]},
    "array_field": {
      "type": ["array", "null"],
      "items": {"type": ["string", "null"]}
    }
  },
  "additionalProperties": false
}

Multiple Types and Nullable Fields

When a field can accept multiple types, use an array of types and always include null:

{
  "flexible_field": {"type": ["string", "number", "null"]},
  "maybe_array": {"type": ["array", "null"], "items": {"type": ["string", "null"]}}
}

Enums for Fixed Values

Constrain field values to specific options and always include null:

{
  "status": {
    "type": ["string", "null"],
    "enum": ["pending", "approved", "rejected", "null"]
  },
  "priority": {
    "type": ["string", "null"],
    "enum": ["low", "medium", "high", "null"]
  }
}

Arrays

Define arrays for extracting lists of items from documents:

{
  "invoice_items": {
    "type": ["array", "null"],
    "items": {
      "type": "object",
      "properties": {
        "article_code": {"type": ["string", "null"]},
        "description": {"type": ["string", "null"]},
        "price": {"type": ["number", "null"]}
      }
    }
  }
}

Internal References

Use $ref for reusing schema definitions within the same document:

{
  "$defs": {
    "Address": {
      "type": "object",
      "properties": {
        "street": {"type": ["string", "null"]},
        "city": {"type": ["string", "null"]},
        "country": {"type": ["string", "null"]}
      },
      "additionalProperties": false
    }
  },
  "type": "object",
  "properties": {
    "billing_address": {"$ref": "#/$defs/Address"},
    "shipping_address": {"$ref": "#/$defs/Address"}
  },
  "additionalProperties": false
}

Invoice Extraction Example

Let's extract structured data from an invoice document. We want to capture the invoice header information (number, date, customer details), a list of line items with their details, and calculate totals. Some fields might be missing or unclear in the source document, so we'll handle those as nullable.

JSON Schema:

{
  "$defs": {
    "address": {
      "type": "object",
      "properties": {
        "street": {"type": ["string", "null"]},
        "city": {"type": ["string", "null"]},
        "postal_code": {"type": ["string", "null"]},
        "country": {"type": ["string", "null"]}
      },
      "additionalProperties": false
    },
    "line_item": {
      "type": "object",
      "properties": {
        "article_code": {"type": ["string", "null"]},
        "description": {"type": ["string", "null"]},
        "quantity": {"type": ["number", "null"]},
        "unit_price": {"type": ["number", "null"]},
        "total_price": {"type": ["number", "null"]},
        "tax_rate": {"type": ["number", "null"]}
      },
      "additionalProperties": false
    }
  },
  "type": "object",
  "properties": {
    "invoice_number": {"type": ["string", "null"]},
    "invoice_date": {"type": ["string", "null"]},
    "due_date": {"type": ["string", "null"]},
    "customer": {
      "type": ["object", "null"],
      "properties": {
        "name": {"type": ["string", "null"]},
        "company": {"type": ["string", "null"]},
        "address": {"$ref": "#/$defs/address"},
        "tax_id": {"type": ["string", "null"]}
      },
      "additionalProperties": false
    },
    "items": {
      "type": ["array", "null"],
      "items": {"$ref": "#/$defs/line_item"}
    },
    "currency": {"type": ["string", "null"]},
    "subtotal": {"type": ["number", "null"]},
    "tax_amount": {"type": ["number", "null"]},
    "total_amount": {"type": ["number", "null"]},
    "payment_status": {
      "type": ["string", "null"],
      "enum": ["paid", "pending", "overdue", "cancelled", "null"]
    },
    "notes": {"type": ["string", "null"]}
  },
  "additionalProperties": false
}

Example of expected JSON Output:

{
  "invoice_number": "INV-2024-001",
  "invoice_date": "2024-03-15",
  "due_date": "2024-04-15",
  "customer": {
    "name": "John Smith",
    "company": "Smith Construction Ltd",
    "address": {
      "street": "123 Main Street",
      "city": "New York",
      "postal_code": "10001",
      "country": "USA"
    },
    "tax_id": "US123456789"
  },
  "items": [
    {
      "article_code": "CONC-001",
      "description": "Concrete blocks 20x20cm",
      "quantity": 50,
      "unit_price": 12.50,
      "total_price": 625.00,
      "tax_rate": 0.08
    },
    {
      "article_code": null,
      "description": "Delivery service",
      "quantity": 1,
      "unit_price": 75.00,
      "total_price": 75.00,
      "tax_rate": null
    }
  ],
  "currency": "USD",
  "subtotal": 700.00,
  "tax_amount": 50.00,
  "total_amount": 750.00,
  "payment_status": "pending",
  "notes": "Net 30 payment terms"
}

Unsupported Features

Avoid these JSON Schema features as they may cause compatibility issues:

  • nullable - Use ["type", "null"] arrays instead
  • anyOf, oneOf, allOf - Use type arrays for multiple types
  • format constraints (email, date-time, etc.) - Use plain strings
  • String validation (minLength, maxLength, pattern)
  • Numeric constraints (minimum, maximum)
  • default values
  • $schema declarations
  • External $ref references

Best Practices

Design for clarity: Use descriptive property names and include description fields when the purpose isn't obvious.

Test with simple prompts: Verify your schema works by testing with straightforward requests before using complex prompts.

Keep nesting shallow: Limit object nesting to 4-5 levels maximum for better reliability.

Use enums liberally: When you know the possible values, constrain them with enums rather than relying on the model to choose appropriate values.

Handle null values explicitly: Always include null in your type arrays to prevent extraction failures when information is missing from the source document.

This approach ensures your JSON schemas generate consistent, structured outputs regardless of which LLM provider processes your requests.