| Availability |
Odoo Online
Odoo.sh
On Premise
|
| Odoo Apps Dependencies |
•
Contacts (contacts)
• Discuss (mail) • Invoicing (account) • Purchase (purchase) • Sales (sale_management) |
| Lines of code | 10779 |
| Technical Name |
bb_ai_document_ocr |
| License | LGPL-3 |
| Website | https://bbtech.ae |
| Availability |
Odoo Online
Odoo.sh
On Premise
|
| Odoo Apps Dependencies |
•
Contacts (contacts)
• Discuss (mail) • Invoicing (account) • Purchase (purchase) • Sales (sale_management) |
| Lines of code | 10779 |
| Technical Name |
bb_ai_document_ocr |
| License | LGPL-3 |
| Website | https://bbtech.ae |
AI Document OCR
Scan • Extract • Review • Create
Turn PDFs and images into draft sales quotations, RFQs, customer invoices, and vendor bills with AI-powered OCR, partner matching, product matching, tax mapping, and full attachment traceability.
& Enterprise
What This App Does
AI Document OCR converts scanned PDFs and images into structured Odoo records. The original file is always preserved as an attachment, extracted data is reviewed before any record is created, and every generated Odoo document stays in draft so your team can validate before posting.
Attach & Trace
The uploaded PDF/image is stored on the OCR record and on the created Odoo document (chatter + smart button) so auditors can always see the source.
Review Before Create
Every extracted document lands in Needs Review with header, partner, lines and totals. Nothing is created automatically unless you opt-in.
Draft Records Only
Created sale orders, RFQs, customer invoices and vendor bills are always drafts. Confirmed or posted documents are never modified automatically.
Partner & Product Matching
Partners are matched by VAT, email, phone, name and fuzzy logic. Products are matched by barcode, SKU, vendor code, name and fuzzy logic.
Tax & Currency Mapping
Taxes match by rate, name, settings and product defaults. Currencies are detected from the document, fallback to company currency when needed.
Duplicate Detection
Built-in deduplication across OCR records, sale orders, purchase orders, invoices and bills warns before creating a document twice.
Supported Document Types
Each document type is mapped to the correct Odoo record, with the correct partner role and the correct journal/tax defaults.
🛒 Sales Quotation / Customer PO
Upload a customer's purchase order. Partner = the customer issuing the PO.
💾 Purchase RFQ / Supplier Quotation
Upload a supplier's quotation. Partner = the supplier issuing the quotation.
🧾 Customer Invoice
Upload a customer invoice you need to record. Partner = the customer.
📊 Vendor Bill
Upload a supplier tax invoice. Partner = the supplier issuing the bill (never the "Invoice To" block).
Scan → Extract → Review → Create
Four clean steps. The user is always in control between extraction and Odoo record creation.
Scan / Upload
Upload PDF, JPG, JPEG, PNG or WEBP from the Upload wizard, or click the OCR button on a sale order, purchase order, invoice or bill form.
Extract
The chosen OCR engine returns raw text and structured JSON. Header, partner, dates, currency, totals, taxes and line items are normalized to a single schema.
Review
Review extracted fields, lines and confidence in the OCR document form. Match partner / products manually, or accept the auto-matches. Validation warnings are explicit.
Create
Click Create Draft Document. A draft sale order, RFQ, customer invoice or vendor bill is created with the same lines and the original file attached.
OCR & AI Engine Options
Plug in the engine that fits your budget, privacy posture and document mix. The module abstracts every provider behind a single schema so you can switch any time.
OpenAI (GPT-4o Vision)
Native vision on PDFs and images. Strong table understanding, excellent for multi-language invoices.
Google Gemini
Native PDF and image ingestion with high recall on tables. Cost-efficient at scale.
Azure (OpenAI & Document Intelligence)
Use Azure OpenAI for vision-LLM, or Azure Document Intelligence prebuilt invoice models for structured extraction.
Tesseract (On-Device)
100% on-premises raw text extraction with image preprocessing (CLAHE, deskew, threshold). Pair with "Structure with AI" for best results.
Mock / Demo Provider
Returns realistic deterministic JSON, perfect for demos, training and tests with no API keys.
Built-In Regex Fallback
Layout-aware deterministic parser: row-major, column-major, invoice tax-table and chunked-row strategies, with strict header/footer table isolation.
Key Features
💾 Universal File Support
- PDF (text-based & scanned)
- JPG, JPEG, PNG, WEBP
- Automatic image preprocessing
- Best-effort PDF rasterization (PyMuPDF / pdf2image)
- Embedded PDF text extraction (pypdf)
🤖 AI & OCR Engines
- OpenAI (GPT-4o family) with vision
- Google Gemini (native PDF + image)
- Azure OpenAI & Document Intelligence
- Local Tesseract for on-premises OCR
- Mock provider for demos & tests
👥 Smart Partner Matching
- Match by VAT / TRN / GST
- Match by email, phone, website
- Exact + fuzzy name matching
- Document-type aware (customer / vendor)
- Auto-create partner only when confidence is high
🛒 Product Matching
- Barcode & default code (SKU)
- Vendor-specific code
- Exact + fuzzy name matching
- Auto-create product when not found
- UOM, quantity, price and discount preserved
📈 Tax & Currency Mapping
- Tax matching by rate + name + scope
- Configurable default sales / purchase tax
- Currency detection (ISO codes, symbols)
- Fallback to company currency
- Per-line tax amount validation
📣 Duplicate Detection
- Cross-check OCR records
- Cross-check sale & purchase orders
- Cross-check customer invoices & vendor bills
- Soft warning during testing, strict on creation
- Override available for OCR Managers
📝 Draft-Safe Creation
- Always creates draft sale.order
- Always creates draft purchase.order
- Always creates draft out_invoice / in_invoice
- Original file attached to record & chatter
- Confirmed records never modified
🌐 Multi-Language
- English documents
- Arabic / RTL documents
- Mixed English + Arabic content
- UTF-8 raw text preserved
- Provider prompts hint language
🛡 Audit & Compliance
- Provider call logs & timing
- Raw OCR text, JSON & normalized JSON kept
- Parser debug panel on every record
- Optional Enterprise Documents integration
- Per-company record rules
Professional OWL Dashboard
A modern OWL + Chart.js dashboard with KPI cards, trends, document-type splits, provider performance and confidence distribution. Every card is clickable and drills down to the matching records.
KPI Cards • Documents Trend • Documents by Type • Documents by State • Provider Performance
Confidence Distribution & Review Bottlenecks
Partner & Product Matching, Amount by Document Type
Recent Documents, Needs Review & Top Warnings Tables
Dashboard Highlights
- Filter by date range, company, type, state, provider, user
- 10+ KPI cards (total, needs review, created, failed, average confidence…)
- 6+ Chart.js visualizations (line, area, doughnut, bar, stacked)
- Three tables: Recent, Needs Review, Top Warnings
- Click-through into filtered list and form views
- Responsive grid, no scroll-blocking
Upload, Process & Review
Clean Kanban / List / Form views with the same data layout across every document type.

Upload Document Wizard - Pick Document Type, Provider & File

Color-Coded Kanban Cards Grouped by State

List View with Confidence, Amounts & Status

OCR Document Form - Header, Partner, Amounts & Lines

Action Buttons - Re-Run OCR, Parse Raw Text, Match Partner, Match Products

Raw OCR Text Tab - Full Text Extracted from PDF/Image

Extracted JSON Tab - Raw Provider Payload & Normalized Schema

Logs & Parser Debug - Strategy, Sums Validation, Rejected Lines

Failed Document - Clear Error Details & Recovery Buttons
🛒 Sales OCR Workflow
Drop a customer PO directly into Sales. The module reads it, creates a draft quotation with the same lines, and attaches the original file.

Scan Customer PO Button on Any Sale Order

Draft Quotation Created From the Extracted PO
💾 Purchase OCR Workflow
Scan a supplier quotation directly into the Purchase app. A draft RFQ is created with the supplier already matched as partner.

Scan Supplier Quotation Button on Any Purchase Order

Draft RFQ Created From the Extracted Supplier Quotation
🧾 Customer Invoice OCR
Scan customer invoices that need to be recorded. The customer partner is identified, tax lines are mapped, and a draft customer invoice is created in the chosen journal.

Customer Invoice Workflow - Scan, Review & Create Draft Invoice
📊 Vendor Bill OCR
Scan supplier tax invoices. The vendor (issuer) is selected as the partner - never the "Invoice To" or "Ship To" block - and a draft vendor bill is created in the correct journal.

Vendor Bill Workflow - Multi-Column Tax Invoice Parsing
Partner & Product Matching
Partners and products are matched against your existing Odoo data first. New records are only auto-created when confidence is high and the parser health checks pass.

Auto-Created Product From the OCR Line - Editable Before Confirming
OCR Provider Configuration
All five providers ship pre-created. Set API keys, choose models and test the connection in one click.

OCR Providers List - All 5 Engines Available Out of the Box

Provider Form - API Key, Model, Endpoint, Temperature, Schema Override

Connection Logs - Every API Call Stored With Duration, Status & Response Snippet
Configurable Behaviour
Choose how aggressive the automation should be. All defaults are deliberately safe (no auto-create of final documents).

Settings - Auto Match Partner / Products, Min Confidence Threshold

Settings - Default Journals, Taxes, Salesperson & Purchase Rep

Security Groups - OCR User, OCR Manager (per-group record rules included)
Installation in 5 Steps
Copy Module
Copy the bb_ai_document_ocr folder into your Odoo addons directory.
Update Apps List
Apps → Update Apps List, then search for "AI Document OCR".
Install
Click Install on the "AI Document OCR" application card.
Install Python Libraries
pip install requests pillow pymupdf pypdf — only those you actually need.
Restart Odoo & Configure Provider
Restart the Odoo service, then open AI Document OCR → Configuration → OCR Providers and set your default provider (Mock provider works out of the box).
How to Configure OCR Providers
| Provider | What to enter | Where to get credentials |
|---|---|---|
| OpenAI | API key, model (e.g. gpt-4o-mini), temperature, max tokens |
platform.openai.com → API keys |
| Google Gemini | API key, model (e.g. gemini-1.5-flash / gemini-1.5-pro) |
aistudio.google.com → Get API key |
| Azure OpenAI | API key, endpoint, deployment name, API version | Azure Portal → Azure OpenAI resource → Keys & Endpoint |
| Azure Document Intelligence | API key, endpoint, prebuilt-invoice or layout model | Azure Portal → AI Services → Document Intelligence |
| Tesseract | Path to tesseract binary (when not on $PATH) |
Install locally: apt / brew / Tesseract installer for Windows |
| Mock Provider | Nothing - works out of the box | Perfect for demos & tests with no API keys |
Three Ways to Scan a Document
1. From the OCR Menu
AI Document OCR → Upload Document. Pick a document type, attach the file and click Upload & Process OCR.
Best for: bulk OCR sessions where the operator processes multiple documents in a row.
2. From a Sales / Purchase Doc
Open any sale order / purchase order / invoice / bill and click the Scan header button.
Best for: enriching an existing draft with a customer PO or supplier quotation file.
3. From the Sales/Purchase Menu
Dedicated menu entries: Sales → Scan Customer PO, Purchase → Scan Supplier Quotation, Accounting → Scan Customer Invoice / Vendor Bill.
Best for: users who live in Sales/Purchase/Accounting all day.
Technical Architecture
A modular layout that's easy to extend — add a new OCR provider, a new matcher, or a new layout-specific parser without touching the rest.
📁 Module Layout
bb_ai_document_ocr/ ├─ controllers/ HTTP routes (dashboard data) ├─ data/ Sequences, providers, cron, mail ├─ models/ OCR records, providers, settings ├─ providers/ OCR / AI provider implementations ├─ report/ QWeb extraction summary report ├─ security/ Groups, ACLs, record rules ├─ services/ Matchers, preprocessors, JSON schema ├─ static/ SCSS / OWL dashboard / placeholders ├─ views/ All XML views & menus └─ wizard/ Upload & matching wizards
🤗 Provider Abstraction
Each provider extends BaseProvider and implements process(ocr_document) returning a schema-validated dict (services/ai_json_schema.py). Switching providers is one click - the rest of the pipeline is identical.
- Single normalized JSON schema across providers
- Tolerant JSON repair for noisy LLM output
- Provider call logging with timing & payload
- Per-provider model / temperature / endpoint overrides
- Optional JSON schema override on each provider
Optional Python Dependencies
The module degrades gracefully if these are missing. Install only the ones you need.
| Library | Used for | Install |
|---|---|---|
| requests | OpenAI / Gemini / Azure HTTP calls | pip install requests |
| Pillow | Image normalization, Tesseract image input | pip install pillow |
| pymupdf | Best-effort PDF rasterization (preferred) | pip install pymupdf |
| pdf2image | Alternative PDF rasterization (needs poppler) | pip install pdf2image |
| pypdf | Embedded PDF text extraction | pip install pypdf |
| pytesseract | Local Tesseract bridge | pip install pytesseract |
The Mock provider needs none of the above.
Security, Multi-Company & Attachments
Per-Group Record Rules
Two security groups: OCR User (own records) and OCR Manager (all records). Standard Odoo ACL conventions.
Multi-Company Aware
Every OCR record carries a company_id; providers can be company-restricted; dashboard respects the user's allowed companies.
Attachment Retention
The original file is kept on the OCR record AND on the created Odoo document (chatter + smart button). Optional Enterprise Documents integration creates a documents.document too.
Known Limitations & Best Practices
⚠ Tesseract Returns Raw Text Only
On-device Tesseract cannot reconstruct complex tables. Pair it with the "Structure with AI" button or pick Gemini / OpenAI / Azure Document Intelligence for the best result on scanned invoices.
⚠ PDF Rasterization
Without PyMuPDF or pdf2image, OpenAI / Azure OpenAI providers fall back to PDF text extraction, which may not work on scanned PDFs. Install pymupdf for the best experience.
✓ Confirmed Records Are Safe
Confirmed sale orders, posted invoices and bills are intentionally never modified by the module. Re-scanning a document always creates a new draft.
✓ Best Practice: Verify Totals
The parser-health gate blocks Create Draft Document when line subtotals don't match the header total. Always verify the line table before clicking the button.
Screenshot Gallery


























Technical Specifications
| Specification | Details |
|---|---|
| Required Modules | base, mail, contacts, product, sale_management, purchase, account, web |
| Enterprise-Only Dependency | None — works on Community and Enterprise |
| License | LGPL-3 |
| Frontend Framework | OWL (Odoo Web Library) + Chart.js |
| Supported Files | PDF, JPG, JPEG, PNG, WEBP |
| Created Records | sale.order, purchase.order, account.move (out_invoice / in_invoice) — always draft |
| Languages | English, Arabic, mixed-language content |
| Multi-Company | Full multi-company support with per-group record rules |
| Optional Documents Integration | Enterprise Documents app supported, but never required |
| Deployment | Odoo.sh, On-Premises, Community & Enterprise Edition |
Binary Bridge Technology Services
Expert Odoo development, customization, and implementation services. Trusted by 100+ clients across 36 industries worldwide.
Please log in to comment on this module