Set up guide

🕒

Estimated setup time: 15–20 minutes. If you use all defaults without changes, 10 minutes.


Step 1 — Activate the template

  1. Open the Phacet template gallery
  2. Select "Invoice Data Extraction"
  3. Click Create a new project from the template

Your project is now pre-configured with all six processing stages, column definitions, and agent instructions. No need to build anything from scratch.

Step 2 — Review extraction formatting rules

The template ships with formatting rules tuned for French and European invoices. Review these against your actual invoice mix before running your first batch.

  1. Open the core field extraction configuration (Step 1)
  2. Review the default formatting rules:
    • Dates: DD/MM/YYYY for invoice date, DD-MM-YY for delivery date, DD-MM for due date
    • Amounts: comma as decimal separator, two decimal places, space before € symbol (e.g., 1 234,50 €)
    • Addresses: [Number] [Street type] [Street name], [Postal code] [CITY]
  3. If your invoices use different conventions (e.g., US date format MM/DD/YYYY, period as decimal separator, $ instead of €), adjust the formatting rules in the agent instructions for the relevant fields
  4. If you process invoices in multiple currencies, add currency detection logic to the amount extraction instructions

Step 3 — Review payment method options

The template recognizes eight payment method categories.

  1. Open the payment method configuration (Step 1)
  2. Review the default options: Bank transfer, Check, Credit card, Cash, Direct debit, Payment at 30 days, Payment at 60 days, Not specified
  3. If your suppliers use payment terms not covered (e.g., Payment at 45 days, Letter of credit), add them to the single-select list and provide detection criteria in the agent instructions

Step 4 — Review industry classification categories

The template classifies suppliers into 19 industry sectors.

  1. Open the industry classification configuration (Stage 4)
  2. Review the default categories: Agriculture, Manufacturing, Construction, Retail, Transportation, Food Service, IT, Finance, Real Estate, Professional Services, Administration, Education, Healthcare, Leisure, Other Services, Energy, Crafts, Telecommunications, Other
  3. If your company uses a different industry taxonomy internally, rename or reorganize categories to match, this makes the output directly usable in your reporting

Step 5 (optional) — Add company-specific extraction rules

The template handles standard invoice layouts out of the box. If you have suppliers with unusual formats, add rules.

  1. Open the supplier name extraction configuration (Step 1) and relevant field configurations
  2. For suppliers whose PDF layout is non-standard, add extraction hints. For example: if a supplier puts their company name in the footer rather than the header, note this in the agent instructions
  3. If certain suppliers use abbreviations or trade names that differ from their legal name, add mapping rules so the agent normalizes consistently

Step 6 — Run a test batch

  1. Upload 10–20 test invoices into the project (PDF upload or push via automation)
  2. Let the agent process the full batch
  3. Review results against this checklist:
    • ✅ Supplier names extracted correctly (including legal form: SARL, SAS, Ltd, etc.)?
    • ✅ Invoice numbers preserved in their exact original format?
    • ✅ Dates formatted correctly and pointing to the right date on the invoice?
    • ✅ Amounts match the PDF: total incl. tax, tax amount, and the math checks out?
    • ✅ Addresses standardized consistently?
    • ✅ Payment methods identified correctly?
    • ✅ Line items complete: all rows from the invoice table captured?
    • ✅ Overall AI scores above 90% for clean, standard invoices?
    • ✅ AI scoring rationale shows "Ok" for invoices you've verified manually?
  4. Open 5–6 results in the Review interface and check the citations:
    • Does the agent point to the correct PDF section for each extracted field?
    • Does the JSON export contain all fields with correct values?
  5. Compare the Invoice JSON output against your accounting system's expected input format

If extraction accuracy is low:

  • Review the formatting rules or company-specific rules
  • Ensure correct PDF quality: ask suppliers for native PDFs where possible.

If AI scores are consistently low: check whether the double extraction is catching real errors or formatting differences the scoring rules should tolerate.

Step 7 — Go live

  1. Establish your processing routine, batch upload after receiving invoices, continuous feed via automation (see the Integration Guide), or forwarded from another Phacet template (e.g., AI Inbox)
  2. Set your quality threshold, decide the minimum Overall AI Score at which invoices flow through without review (e.g., 95%)
  3. Assign team members to the Review queue for low-scoring and incomplete extractions
  4. Connect the JSON output to your downstream system , ERP import, accounting tool, or automation workflow

Troubleshooting

ProblemLikely causeFix
Supplier name missing legal form (SARL, SAS, etc.)The legal form appears in a different section than the main name on the invoiceAdd extraction hints to Step 1 for the affected supplier's layout.
Amounts show wrong decimal separatorInvoice uses a format different from the default (comma separator, € symbol)Adjust the amount formatting rules in Step 1 to match your invoice conventions.
Dates extracted in wrong formatInvoice date, delivery date, and due date each use a different output format by designCheck which date field is wrong, each has its own format rule (DD/MM/YYYY, DD-MM-YY, DD-MM). Adjust the specific stage.
Line items incomplete / rows missingInvoice uses a multi-page table, merged cells, or unconventional layoutCheck the PDF quality. For complex table layouts, add extraction hints to Step 2. If the table spans pages, verify the agent is reading all pages.
"Not specified" on payment methodNo IBAN, no payment terms, no card terminal reference found on the invoiceThis is correct behavior, the agent doesn't guess. If you know the payment method from context, add it manually or add a supplier-specific rule.
JSON export missing fieldsA required field returned "Not specified" or was emptyThe JSON consolidates extracted fields. If a source field is empty, the JSON field will be too. Fix the extraction first, then the JSON follows.
Supplier enrichment returns wrong companyMultiple companies with similar names in Pappers.frThe agent picks the closest match by name. If it's consistently wrong for a supplier, add the exact legal name or SIREN to the enrichment search criteria.