| Availability |
Odoo Online
Odoo.sh
On Premise
|
| Odoo Apps Dependencies |
Discuss (mail)
|
| Community Apps Dependencies | Show |
| Lines of code | 4632 |
| Technical Name |
llm_knowledge |
| License | LGPL-3 |
| Website | https://github.com/apexive/odoo-llm |
| Versions | 16.0 18.0 |
| Availability |
Odoo Online
Odoo.sh
On Premise
|
| Odoo Apps Dependencies |
Discuss (mail)
|
| Community Apps Dependencies | Show |
| Lines of code | 4632 |
| Technical Name |
llm_knowledge |
| License | LGPL-3 |
| Website | https://github.com/apexive/odoo-llm |
| Versions | 16.0 18.0 |
Your Company Knowledge Is Trapped
Scattered across PDFs, Word docs, spreadsheets, images, text files, and folders...
Endless Searching
"Where was that policy document? Was it in SharePoint? Email? The old server?"
Format Chaos
PDFs, .docx, .txt, .xlsx, images - each needs different tools to search
Wasted Hours
Your team spends hours each week hunting for information they KNOW exists
Information Silos
Knowledge locked in different systems - no single place to find answers
Result: Slower decisions, frustrated teams, and knowledge that might as well not exist
There's a Better Way
Transform ALL your documents into unified, AI-searchable knowledge
Cut 'Where Is That Document?'
Time to Zero
Your employees spend hours each week hunting through files, emails, and SharePoint. Let AI search your entire knowledge base for themâjust ask a question in plain English.
Powered by Retrieval-Augmented Generation (RAG) - the technology that makes AI truly understand your business
What is RAG and Why Should You Care?
Transform your AI from a know-it-all to a know-YOUR-business expert
Think of RAG as giving your AI assistant a perfect memory of YOUR company's knowledge. Instead of getting generic AI answers, your assistant reads your actual documents, policies, and procedures before responding—like having an expert employee who never forgets anything and can instantly access every document you've ever created.
Transform How Your Team Works
Instant Expert Answers
Get accurate responses based on YOUR company's documentation, not generic internet knowledge.
Save Hours Daily
Stop digging through folders. Ask questions naturally and get the exact information with sources.
Always Up-to-Date
Upload new docs anytime. Your AI instantly knows about changes. No retraining needed.
100% Data Privacy
Your documents stay in YOUR Odoo. Complete data sovereignty, no external sharing.
See the Difference in Action
Real scenarios where RAG transforms daily operations
Employee Onboarding
New sales rep spends days reading manuals, repeatedly asks colleagues about product specs and policies
Asks AI: "What are our Pro subscription features?" Gets instant, accurate answers from training materials
"Based on Product Guide v2.3: Pro includes unlimited users, API access, priority support..."
Customer Support
Agent puts customer on hold, searches multiple systems, might give incomplete answer
Types question, instantly gets comprehensive answer from manuals and past tickets
"Per Troubleshooting Guide section 4.2 and ticket #1823: Reset procedure is..."
Policy Questions
Manager emails HR, waits for response, might get outdated information
Asks: "Remote work policy for international employees?" Gets immediate, accurate answer
"HR Handbook 2024, page 47: International remote work requires VP approval..."
Simple as 1-2-3-4-5
Get intelligent answers from your documents in minutes
Upload Documents
Add PDFs, manuals, policies to organized collections
Auto Processing
System chunks and indexes content intelligently
Ask Questions
Type questions in plain language naturally
Get Answers
AI provides comprehensive answers with sources
Keep Learning
Add new docs anytime to expand knowledge
Complete Setup Guide
From zero to production RAG in 6 steps with real examples
Setup Vector Store
Install and configure Qdrant for vector storage
Install Vector Store Module
Choose a vector store to enable semantic search capabilities. This example uses llm_qdrant, but you can also use:
- llm_qdrant - High-performance vector database (shown in this guide)
- llm_pgvector - PostgreSQL extension (if you prefer SQL)
- llm_chroma - Lightweight embedded database
Navigate to LLM → Stores to view configured vector stores
Configure Qdrant connection: host, port, API key, and collection name
docker run -p 6333:6333 qdrant/qdrant
Configure Embedding Provider
Setup OpenAI for text embeddings
Install Module: llm_openai
Configure OpenAI provider and select the text-embedding-3-small model for efficient, cost-effective embeddings.
Select text-embedding-3-small model for document embeddings
llm_mistral for Mistral embeddings or llm_ollama for local embeddings
BONUS: Parse Images with AI Optional
Extract text from images, receipts, screenshots with Mistral OCR
Install Module: llm_knowledge_mistral
Want to extract text from images, receipts, screenshots, or handwritten notes? Install the Mistral parser extension to unlock OCR capabilities!
Available Mistral models with OCR/vision capabilities
Sample image: Handwritten grocery list
Perfect parsing! Mistral extracted all text from the handwritten image
What Can You Parse?
- Receipts & Invoices: Extract line items, prices, totals
- Handwritten Notes: Convert handwriting to searchable text
- Screenshots: Pull text from app screenshots, charts
- Scanned Documents: OCR any scanned paper document
- Product Labels: Read ingredients, specifications, warnings
How It Works:
- Upload image files (.jpg, .png, .webp, .gif) to your collection
- Select Mistral OCR model as parser
- Process resources - Mistral extracts all text automatically
- Text is chunked, embedded, and made searchable like any other document
Create Knowledge Collection
Organize documents into searchable collections
Create a collection that links your vector store (Qdrant) with your embedding model (OpenAI). Collections organize related documents together.
Create collection: Select vector store and embedding model
Key Settings:
- Vector Store: Qdrant (from Step 1)
- Embedding Model: text-embedding-3-small (from Step 2)
- Chunk Size: 512-1024 tokens (adjustable)
Upload & Process Documents
The complete RAG pipeline in action
Step 4a: Upload Documents
Add documents to your collection in multiple ways:
- Upload Files: PDFs, text files, markdown, images with text
- External URLs: Provide links to download files from external sources
- Website Links: Fetch and parse content directly from web pages
Upload documents via file picker or URL
Step 4b: View Uploaded Resources
See all documents in your collection and trigger processing manually or automatically.
Resources list showing uploaded documents ready for processing
Step 4c: Processing Pipeline
Watch the magic happen: Parse → Chunk → Embed
Real-time processing status: parsing text, chunking content, generating embeddings
Step 4d: View Chunks
Inspect individual chunks with their embeddings, ready for semantic search.
Chunk list showing embedded text pieces with metadata
Use with LLM Assistants
Query your knowledge directly in Odoo
Install Module: llm_tool_knowledge
Enable your LLM assistants to search your knowledge collections. Create an assistant with knowledge retrieval tools.
Chat with your LLM assistant that can search and cite your documents
Example Queries:
- "What are our company's remote work policies?"
- "Show me the product specifications for Model X"
- "How do I reset a customer's password?"
Connect External Apps
Use with Claude Desktop, Cursor, Ollama, and more
Install Module: llm_mcp_server
The Model Context Protocol (MCP) server exposes your knowledge collections to any MCP-compatible application.
Claude Desktop app querying your Odoo knowledge base with source citations
Compatible Apps: Cursor, Claude Code, Claude Desktop, Codex CLI, and many more MCP-compatible applications
What's Next?
Auto-Sync
Install llm_knowledge_automation to automatically keep collections synced with your data
Alternative Stores
Try llm_pgvector (PostgreSQL) or llm_chroma for different vector storage options
Advanced Parsing
Use llm_mistral for image/OCR parsing or llm_knowledge_llama for LlamaIndex chunking
Enterprise-Ready Knowledge Management
Powerful features designed for real business needs
Organized Collections
Create separate knowledge bases for different departments or topics
- HR Policies
- Product Documentation
- Technical Manuals
- Sales Playbooks
Intelligent Chunking
Documents automatically split into optimal pieces for accurate retrieval
Preserves context while maximizing search accuracy
Semantic Search
Finds relevant content even when exact words don't match
Ask "vacation policy" → finds "PTO guidelines"
Multiple Formats
Process various document types seamlessly
- PDF documents
- Text files
- Markdown files
- Web pages (HTTP)
Flexible Storage
Choose your preferred vector database
- PostgreSQL pgvector
- ChromaDB
- Qdrant
- More coming soon
Source Citations
Every answer includes references to source documents
Build trust with transparent, verifiable responses
Seamless Integration
Works perfectly with the entire Odoo LLM ecosystem
AI Assistant Integration
Enhance your LLM assistants with your company's knowledge base
Works with:
- LLM Assistant - Create knowledge-aware AI assistants
- LLM Thread - Chat with document context
- LLM Tool Knowledge - RAG tools for any model
Provider Flexibility
Use any LLM provider for embeddings and generation
Supported providers:
- OpenAI - GPT models and embeddings
- Mistral - European AI models
- Ollama - Local, private models
- More providers available
Transform Your AI Into a Company Expert
Stop settling for generic AI answers. Give your team instant access to YOUR knowledge.
Documentation & Support
Everything you need to succeed
Complete Documentation
Step-by-step guides and best practices included
GitHub Repository:
github.com/apexive/odoo-llm
Full documentation, examples, and tutorials
Open Source
100% open source with active community support
License: LGPL-3
Customize, extend, or contribute back to the community
LLM Knowledge
RAG (Retrieval-Augmented Generation) system for Odoo. Document processing, vector storage, and semantic search with multiple vector store backends.
What is RAG?
Retrieval-Augmented Generation enhances LLM responses by retrieving relevant context from your documents before generating answers. Instead of relying solely on the model's training data, RAG:
- Retrieves relevant document chunks using semantic search
- Augments the LLM prompt with retrieved context
- Generates responses grounded in your actual documents
This enables LLMs to answer questions about your specific data with source citations, reducing hallucinations and providing verifiable information.
Installation
Prerequisites
- Odoo: 18.0 or higher
- Required Modules: llm, llm_store
- Python Packages: requests, markdownify, PyMuPDF, numpy
- Vector Store: One of llm_qdrant, llm_pgvector, or llm_chroma
Install Steps
Install Python dependencies:
pip install requests markdownify PyMuPDF numpy
Install required Odoo modules:
# Install base modules first odoo-bin -d your_database -i llm,llm_store # Install a vector store (choose one) odoo-bin -d your_database -i llm_qdrant # or llm_pgvector or llm_chroma # Install llm_knowledge odoo-bin -d your_database -i llm_knowledge
Restart Odoo server
Quick Start (5 Minutes)
1. Create a Knowledge Collection
# Access Odoo shell odoo-bin shell -d your_database # Create collection collection = env['llm.knowledge.collection'].create({ 'name': 'Product Documentation', 'store_id': env['llm.store'].search([('store_type', '=', 'qdrant')], limit=1).id, 'embedding_model_id': env['llm.model'].search([('name', '=', 'text-embedding-3-small')], limit=1).id, 'chunk_size': 1000, 'chunk_overlap': 200, }) # Create vector collection in store collection.create_vector_collection()
2. Add a Document
# Add from URL resource = env['llm.resource'].create({ 'name': 'Product Manual', 'url': 'https://example.com/manual.pdf', 'resource_type': 'url', 'collection_ids': [(4, collection.id)], }) # Process the document resource.process_resource() # retrieve → parse → chunk → embed
3. Search Your Knowledge
# Search collection results = env['llm.knowledge.chunk'].search( args=[('embedding', '=', 'How do I reset my password?')], limit=5, collection_id=collection.id ) # Print results for chunk in results: print(f"Score: {chunk.similarity:.2f}") print(f"Source: {chunk.resource_id.name}") print(f"Content: {chunk.content[:200]}...") print("---")
Usage Examples
Add Document from File Upload
# Via attachment attachment = env['ir.attachment'].create({ 'name': 'company_policy.pdf', 'datas': base64.b64encode(open('/path/to/file.pdf', 'rb').read()), }) resource = env['llm.resource'].create({ 'name': 'Company Policy', 'resource_type': 'attachment', 'attachment_id': attachment.id, 'collection_ids': [(4, collection.id)], }) resource.process_resource()
Add Web Page Content
# Fetch and parse web page resource = env['llm.resource'].create({ 'name': 'Technical Documentation', 'url': 'https://docs.example.com/api', 'resource_type': 'url', 'collection_ids': [(4, collection.id)], }) resource.process_resource()
Batch Process Multiple Documents
# Create multiple resources urls = [ 'https://example.com/doc1.pdf', 'https://example.com/doc2.pdf', 'https://example.com/doc3.pdf', ] resources = env['llm.resource'].create([ { 'name': f'Document {i+1}', 'url': url, 'resource_type': 'url', 'collection_ids': [(4, collection.id)], } for i, url in enumerate(urls) ]) # Process all at once collection.process_all_resources()
Use with LLM Assistant
# Create RAG-enabled assistant assistant = env['llm.assistant'].create({ 'name': 'Product Support Bot', 'provider_id': env.ref('llm_openai.provider_openai').id, 'model_id': env['llm.model'].search([('name', '=', 'gpt-4')], limit=1).id, }) # Add knowledge tool tool = env['llm.tool'].create({ 'name': 'search_product_docs', 'implementation': 'knowledge_retriever', 'collection_ids': [(4, collection.id)], }) assistant.tool_ids = [(4, tool.id)] # Chat with knowledge-enhanced assistant thread = env['llm.thread'].create({'assistant_id': assistant.id}) thread.generate_response("How do I configure the API endpoint?")
Screenshots Tutorial
Visual step-by-step guide showing the complete RAG setup:
1. Setup Vector Store
Navigate to LLM → Stores to configure Qdrant, pgvector, or Chroma
Configure connection: host, port, API key
2. Configure Embedding Model
Select text-embedding-3-small or other embedding model
3. Create Collection
Link vector store and embedding model
4. Upload & Process Documents
Upload files, provide URLs, or fetch from web pages
View uploaded documents and trigger processing
Watch the pipeline: Parse → Chunk → Embed
Inspect generated chunks with embeddings
5. Query with Assistant
Chat with knowledge-enhanced assistant
6. Use with External Apps
Query from Claude Desktop, Cursor, or other MCP-compatible apps
Configuration
Collection Settings
# Chunking configuration collection.chunk_size = 1000 # Max tokens per chunk collection.chunk_overlap = 200 # Overlap between chunks # Embedding configuration collection.embedding_model_id = embedding_model # Which model to use collection.store_id = vector_store # Which vector store
Processing Pipeline States
| State | Description | Next Action |
|---|---|---|
| draft | Initial state | retrieve() |
| retrieved | Content fetched | parse() |
| parsed | Converted to markdown | chunk() |
| chunked | Split into segments | embed() |
| ready | Embeddings stored | Search/Query |
Environment Variables
# Optional: Configure default settings ODOO_LLM_CHUNK_SIZE=1000 ODOO_LLM_CHUNK_OVERLAP=200 ODOO_LLM_DEFAULT_EMBEDDING_MODEL=text-embedding-3-small
Models Reference
llm.knowledge.collection
Responsibility: Manages knowledge collections, coordinates document processing, and handles vector store integration.
Key Methods:
- create_vector_collection() - Create corresponding collection in vector store
- process_all_resources() - Process all documents in collection
- search_content(query, limit=10) - Search collection with semantic query
- add_resources(resource_ids) - Add resources to collection
- remove_resources(resource_ids) - Remove resources from collection
- get_statistics() - Get collection statistics (resource count, chunk count, etc.)
llm.resource
Responsibility: Handles document retrieval, parsing, state management, and content extraction from various sources.
Key Methods:
- process_resource() - Run complete pipeline (retrieve → parse → chunk → embed)
- retrieve() - Fetch content from source (URL, attachment, text)
- parse() - Convert raw content to markdown
- chunk() - Split parsed content into chunks
- embed() - Generate embeddings for chunks
- reset_to_state(state) - Reset resource to specific processing state
- lock_resource() - Lock resource during processing
- unlock_resource() - Unlock resource after processing
Fields:
- state - Processing state (draft/retrieved/parsed/chunked/ready)
- resource_type - Source type (url/attachment/text)
- content - Raw retrieved content
- markdown_content - Parsed markdown content
- collection_ids - Associated collections
llm.knowledge.chunk
Responsibility: Represents individual text chunks with embeddings and metadata for vector search.
Key Methods:
- generate_embedding() - Generate and store embedding for chunk
- search(query, limit, collection_id) - Semantic search (overridden search method)
Fields:
- content - Chunk text content
- chunk_index - Position in original document
- vector_id - ID in vector store
- embedding_model_id - Model used for embedding
- token_count - Number of tokens in chunk
- similarity - Search similarity score (computed)
llm.resource.parser
Responsibility: Converts various document formats (PDF, HTML, DOCX) to clean markdown.
Key Methods:
- parse_to_markdown(content, content_type) - Main parsing entry point
- parse_pdf(pdf_bytes) - Extract and parse PDF content
- parse_html(html_content) - Convert HTML to markdown
- clean_markdown(markdown) - Clean up markdown formatting
llm.knowledge.retriever
Responsibility: Handles semantic search and context retrieval for RAG operations.
Key Methods:
- search_knowledge(query, collection_ids, limit, min_score) - Search across collections
- get_relevant_context(query, max_tokens) - Get context for RAG generation
- rank_results(results) - Re-rank search results
Optional Extensions
Image Parsing with Mistral OCR
Install llm_knowledge_mistral to extract text from images, receipts, and handwritten notes:
odoo-bin -d your_database -i llm_knowledge_mistral
Capabilities:
- Handwritten notes
- Receipts and invoices
- Screenshots
- Scanned documents
- Product labels
Automated Knowledge Sync
Install llm_knowledge_automation for automatic document updates:
odoo-bin -d your_database -i llm_knowledge_automation
Alternative Vector Stores
- llm_pgvector - PostgreSQL with pgvector extension (SQL-based)
- llm_chroma - Lightweight embedded vector database
- llm_qdrant - High-performance vector search engine
Documentation
- Full Documentation: doc/index.rst
- GitHub Repository: https://github.com/apexive/odoo-llm
- Module Manifest: __manifest__.py
License
This module is licensed under LGPL-3.
---
© 2025 Apexive Solutions LLC. All rights reserved.
Please log in to comment on this module