LLM Knowledge

v 18.0 Third Party 38

Availability	Odoo Online Odoo.sh On Premise
Odoo Apps Dependencies	Discuss (mail)
Community Apps Dependencies	Show • LLM Integration Base • LLM Vector Store Base
Lines of code	4632
Technical Name	`llm_knowledge`
License	LGPL-3
Website	https://github.com/apexive/odoo-llm
Versions	16.0 18.0

You bought this module and need support? Click here!

Availability	Odoo Online Odoo.sh On Premise
Odoo Apps Dependencies	Discuss (mail)
Community Apps Dependencies	Show • LLM Integration Base • LLM Vector Store Base
Lines of code	4632
Technical Name	`llm_knowledge`
License	LGPL-3
Website	https://github.com/apexive/odoo-llm
Versions	16.0 18.0

Description
Documentation

Your Company Knowledge Is Trapped

Scattered across PDFs, Word docs, spreadsheets, images, text files, and folders...

Endless Searching

"Where was that policy document? Was it in SharePoint? Email? The old server?"

Format Chaos

PDFs, .docx, .txt, .xlsx, images - each needs different tools to search

Wasted Hours

Your team spends hours each week hunting for information they KNOW exists

Information Silos

Knowledge locked in different systems - no single place to find answers

Result: Slower decisions, frustrated teams, and knowledge that might as well not exist

There's a Better Way

Transform ALL your documents into unified, AI-searchable knowledge

AI-Powered Knowledge Management

Cut 'Where Is That Document?'
Time to Zero

Your employees spend hours each week hunting through files, emails, and SharePoint. Let AI search your entire knowledge base for themâjust ask a question in plain English.

Instant Answers

100% Private

Source Citations

What is RAG and Why Should You Care?

Transform your AI from a know-it-all to a know-YOUR-business expert

Think of RAG as giving your AI assistant a perfect memory of YOUR company's knowledge. Instead of getting generic AI answers, your assistant reads your actual documents, policies, and procedures before responding—like having an expert employee who never forgets anything and can instantly access every document you've ever created.

This means accurate, company-specific answers based on YOUR data, not internet generalizations

Transform How Your Team Works

Instant Expert Answers

Get accurate responses based on YOUR company's documentation, not generic internet knowledge.

Save Hours Daily

Stop digging through folders. Ask questions naturally and get the exact information with sources.

Always Up-to-Date

Upload new docs anytime. Your AI instantly knows about changes. No retraining needed.

100% Data Privacy

Your documents stay in YOUR Odoo. Complete data sovereignty, no external sharing.

See the Difference in Action

Real scenarios where RAG transforms daily operations

Employee Onboarding

Without RAG

New sales rep spends days reading manuals, repeatedly asks colleagues about product specs and policies

With RAG

Asks AI: "What are our Pro subscription features?" Gets instant, accurate answers from training materials

"Based on Product Guide v2.3: Pro includes unlimited users, API access, priority support..."

Customer Support

Without RAG

Agent puts customer on hold, searches multiple systems, might give incomplete answer

With RAG

Types question, instantly gets comprehensive answer from manuals and past tickets

"Per Troubleshooting Guide section 4.2 and ticket #1823: Reset procedure is..."

Policy Questions

Without RAG

Manager emails HR, waits for response, might get outdated information

With RAG

Asks: "Remote work policy for international employees?" Gets immediate, accurate answer

"HR Handbook 2024, page 47: International remote work requires VP approval..."

Simple as 1-2-3-4-5

Get intelligent answers from your documents in minutes

Upload Documents

Add PDFs, manuals, policies to organized collections

Auto Processing

System chunks and indexes content intelligently

Ask Questions

Type questions in plain language naturally

Get Answers

AI provides comprehensive answers with sources

Keep Learning

Add new docs anytime to expand knowledge

Complete Setup Guide

From zero to production RAG in 6 steps with real examples

Setup Vector Store

Install and configure Qdrant for vector storage

Install Vector Store Module

Choose a vector store to enable semantic search capabilities. This example uses llm_qdrant, but you can also use:

llm_qdrant - High-performance vector database (shown in this guide)
llm_pgvector - PostgreSQL extension (if you prefer SQL)
llm_chroma - Lightweight embedded database

Navigate to LLM → Stores to view configured vector stores

Configure Qdrant connection: host, port, API key, and collection name

Tip: You can run Qdrant locally with Docker: docker run -p 6333:6333 qdrant/qdrant

Configure Embedding Provider

Setup OpenAI for text embeddings

Install Module: `llm_openai`

Configure OpenAI provider and select the text-embedding-3-small model for efficient, cost-effective embeddings.

Select text-embedding-3-small model for document embeddings

Alternative: You can also use llm_mistral for Mistral embeddings or llm_ollama for local embeddings

BONUS: Parse Images with AI Optional

Extract text from images, receipts, screenshots with Mistral OCR

Install Module: `llm_knowledge_mistral`

Want to extract text from images, receipts, screenshots, or handwritten notes? Install the Mistral parser extension to unlock OCR capabilities!

Available Mistral models with OCR/vision capabilities

Sample image: Handwritten grocery list

Perfect parsing! Mistral extracted all text from the handwritten image

What Can You Parse?

Receipts & Invoices: Extract line items, prices, totals
Handwritten Notes: Convert handwriting to searchable text
Screenshots: Pull text from app screenshots, charts
Scanned Documents: OCR any scanned paper document
Product Labels: Read ingredients, specifications, warnings

How It Works:

Upload image files (.jpg, .png, .webp, .gif) to your collection
Select Mistral OCR model as parser
Process resources - Mistral extracts all text automatically
Text is chunked, embedded, and made searchable like any other document

Create Knowledge Collection

Organize documents into searchable collections

Create a collection that links your vector store (Qdrant) with your embedding model (OpenAI). Collections organize related documents together.

Create collection: Select vector store and embedding model

Key Settings:

Vector Store: Qdrant (from Step 1)
Embedding Model: text-embedding-3-small (from Step 2)
Chunk Size: 512-1024 tokens (adjustable)

Upload & Process Documents

The complete RAG pipeline in action

Step 4a: Upload Documents

Add documents to your collection in multiple ways:

Upload Files: PDFs, text files, markdown, images with text
External URLs: Provide links to download files from external sources
Website Links: Fetch and parse content directly from web pages

Upload documents via file picker or URL

Step 4b: View Uploaded Resources

See all documents in your collection and trigger processing manually or automatically.

Resources list showing uploaded documents ready for processing

Step 4c: Processing Pipeline

Watch the magic happen: Parse → Chunk → Embed

Real-time processing status: parsing text, chunking content, generating embeddings

Step 4d: View Chunks

Inspect individual chunks with their embeddings, ready for semantic search.

Chunk list showing embedded text pieces with metadata

Success! Your documents are now searchable with AI-powered semantic search

Use with LLM Assistants

Query your knowledge directly in Odoo

Install Module: `llm_tool_knowledge`

Enable your LLM assistants to search your knowledge collections. Create an assistant with knowledge retrieval tools.

Chat with your LLM assistant that can search and cite your documents

Example Queries:

"What are our company's remote work policies?"
"Show me the product specifications for Model X"
"How do I reset a customer's password?"

Connect External Apps

Use with Claude Desktop, Cursor, Ollama, and more

Install Module: `llm_mcp_server`

The Model Context Protocol (MCP) server exposes your knowledge collections to any MCP-compatible application.

Claude Desktop app querying your Odoo knowledge base with source citations

Compatible Apps: Cursor, Claude Code, Claude Desktop, Codex CLI, and many more MCP-compatible applications

Complete! Your Odoo knowledge is now accessible from any AI application that supports MCP

What's Next?

Auto-Sync

Install llm_knowledge_automation to automatically keep collections synced with your data

Alternative Stores

Try llm_pgvector (PostgreSQL) or llm_chroma for different vector storage options

Advanced Parsing

Use llm_mistral for image/OCR parsing or llm_knowledge_llama for LlamaIndex chunking

Enterprise-Ready Knowledge Management

Powerful features designed for real business needs

Organized Collections

Create separate knowledge bases for different departments or topics

HR Policies
Product Documentation
Technical Manuals
Sales Playbooks

Intelligent Chunking

Documents automatically split into optimal pieces for accurate retrieval

Preserves context while maximizing search accuracy

Semantic Search

Finds relevant content even when exact words don't match

Ask "vacation policy" → finds "PTO guidelines"

Multiple Formats

Process various document types seamlessly

PDF documents
Text files
Markdown files
Web pages (HTTP)

Flexible Storage

Choose your preferred vector database

PostgreSQL pgvector
ChromaDB
Qdrant
More coming soon

Source Citations

Every answer includes references to source documents

Build trust with transparent, verifiable responses

Seamless Integration

Works perfectly with the entire Odoo LLM ecosystem

AI Assistant Integration

Enhance your LLM assistants with your company's knowledge base

Works with:

LLM Assistant - Create knowledge-aware AI assistants
LLM Thread - Chat with document context
LLM Tool Knowledge - RAG tools for any model

Provider Flexibility

Use any LLM provider for embeddings and generation

Supported providers:

OpenAI - GPT models and embeddings
Mistral - European AI models
Ollama - Local, private models
More providers available

Transform Your AI Into a Company Expert

Stop settling for generic AI answers. Give your team instant access to YOUR knowledge.

Install in Minutes

100% Private

Instant ROI

Documentation & Support

Everything you need to succeed

Complete Documentation

Step-by-step guides and best practices included

GitHub Repository:

github.com/apexive/odoo-llm

Full documentation, examples, and tutorials

Open Source

100% open source with active community support

License: LGPL-3

Customize, extend, or contribute back to the community

LLM Knowledge

RAG (Retrieval-Augmented Generation) system for Odoo. Document processing, vector storage, and semantic search with multiple vector store backends.

What is RAG?

Retrieval-Augmented Generation enhances LLM responses by retrieving relevant context from your documents before generating answers. Instead of relying solely on the model's training data, RAG:

Retrieves relevant document chunks using semantic search
Augments the LLM prompt with retrieved context
Generates responses grounded in your actual documents

This enables LLMs to answer questions about your specific data with source citations, reducing hallucinations and providing verifiable information.

Installation

Prerequisites

Odoo: 18.0 or higher
Required Modules: llm, llm_store
Python Packages: requests, markdownify, PyMuPDF, numpy
Vector Store: One of llm_qdrant, llm_pgvector, or llm_chroma

Install Steps

Install Python dependencies:

pip install requests markdownify PyMuPDF numpy

Install required Odoo modules:

# Install base modules first
odoo-bin -d your_database -i llm,llm_store

# Install a vector store (choose one)
odoo-bin -d your_database -i llm_qdrant  # or llm_pgvector or llm_chroma

# Install llm_knowledge
odoo-bin -d your_database -i llm_knowledge

Restart Odoo server

Quick Start (5 Minutes)

1. Create a Knowledge Collection

# Access Odoo shell
odoo-bin shell -d your_database

# Create collection
collection = env['llm.knowledge.collection'].create({
    'name': 'Product Documentation',
    'store_id': env['llm.store'].search([('store_type', '=', 'qdrant')], limit=1).id,
    'embedding_model_id': env['llm.model'].search([('name', '=', 'text-embedding-3-small')], limit=1).id,
    'chunk_size': 1000,
    'chunk_overlap': 200,
})

# Create vector collection in store
collection.create_vector_collection()

2. Add a Document

# Add from URL
resource = env['llm.resource'].create({
    'name': 'Product Manual',
    'url': 'https://example.com/manual.pdf',
    'resource_type': 'url',
    'collection_ids': [(4, collection.id)],
})

# Process the document
resource.process_resource()  # retrieve → parse → chunk → embed

3. Search Your Knowledge

# Search collection
results = env['llm.knowledge.chunk'].search(
    args=[('embedding', '=', 'How do I reset my password?')],
    limit=5,
    collection_id=collection.id
)

# Print results
for chunk in results:
    print(f"Score: {chunk.similarity:.2f}")
    print(f"Source: {chunk.resource_id.name}")
    print(f"Content: {chunk.content[:200]}...")
    print("---")

Usage Examples

Add Document from File Upload

# Via attachment
attachment = env['ir.attachment'].create({
    'name': 'company_policy.pdf',
    'datas': base64.b64encode(open('/path/to/file.pdf', 'rb').read()),
})

resource = env['llm.resource'].create({
    'name': 'Company Policy',
    'resource_type': 'attachment',
    'attachment_id': attachment.id,
    'collection_ids': [(4, collection.id)],
})

resource.process_resource()

Add Web Page Content

# Fetch and parse web page
resource = env['llm.resource'].create({
    'name': 'Technical Documentation',
    'url': 'https://docs.example.com/api',
    'resource_type': 'url',
    'collection_ids': [(4, collection.id)],
})

resource.process_resource()

Batch Process Multiple Documents

# Create multiple resources
urls = [
    'https://example.com/doc1.pdf',
    'https://example.com/doc2.pdf',
    'https://example.com/doc3.pdf',
]

resources = env['llm.resource'].create([
    {
        'name': f'Document {i+1}',
        'url': url,
        'resource_type': 'url',
        'collection_ids': [(4, collection.id)],
    }
    for i, url in enumerate(urls)
])

# Process all at once
collection.process_all_resources()

Use with LLM Assistant

# Create RAG-enabled assistant
assistant = env['llm.assistant'].create({
    'name': 'Product Support Bot',
    'provider_id': env.ref('llm_openai.provider_openai').id,
    'model_id': env['llm.model'].search([('name', '=', 'gpt-4')], limit=1).id,
})

# Add knowledge tool
tool = env['llm.tool'].create({
    'name': 'search_product_docs',
    'implementation': 'knowledge_retriever',
    'collection_ids': [(4, collection.id)],
})

assistant.tool_ids = [(4, tool.id)]

# Chat with knowledge-enhanced assistant
thread = env['llm.thread'].create({'assistant_id': assistant.id})
thread.generate_response("How do I configure the API endpoint?")

Screenshots Tutorial

Visual step-by-step guide showing the complete RAG setup:

1. Setup Vector Store

Navigate to LLM → Stores to configure Qdrant, pgvector, or Chroma

Configure connection: host, port, API key

2. Configure Embedding Model

Select text-embedding-3-small or other embedding model

3. Create Collection

Link vector store and embedding model

4. Upload & Process Documents

Upload files, provide URLs, or fetch from web pages

View uploaded documents and trigger processing

Watch the pipeline: Parse → Chunk → Embed

Inspect generated chunks with embeddings

5. Query with Assistant

Chat with knowledge-enhanced assistant

6. Use with External Apps

Query from Claude Desktop, Cursor, or other MCP-compatible apps

Configuration

Collection Settings

# Chunking configuration
collection.chunk_size = 1000        # Max tokens per chunk
collection.chunk_overlap = 200      # Overlap between chunks

# Embedding configuration
collection.embedding_model_id = embedding_model  # Which model to use
collection.store_id = vector_store                # Which vector store

Processing Pipeline States

State	Description	Next Action
draft	Initial state	retrieve()
retrieved	Content fetched	parse()
parsed	Converted to markdown	chunk()
chunked	Split into segments	embed()
ready	Embeddings stored	Search/Query

Environment Variables

# Optional: Configure default settings
ODOO_LLM_CHUNK_SIZE=1000
ODOO_LLM_CHUNK_OVERLAP=200
ODOO_LLM_DEFAULT_EMBEDDING_MODEL=text-embedding-3-small

Models Reference

llm.knowledge.collection

Responsibility: Manages knowledge collections, coordinates document processing, and handles vector store integration.

Key Methods:

create_vector_collection() - Create corresponding collection in vector store
process_all_resources() - Process all documents in collection
search_content(query, limit=10) - Search collection with semantic query
add_resources(resource_ids) - Add resources to collection
remove_resources(resource_ids) - Remove resources from collection
get_statistics() - Get collection statistics (resource count, chunk count, etc.)

llm.resource

Responsibility: Handles document retrieval, parsing, state management, and content extraction from various sources.

Key Methods:

process_resource() - Run complete pipeline (retrieve → parse → chunk → embed)
retrieve() - Fetch content from source (URL, attachment, text)
parse() - Convert raw content to markdown
chunk() - Split parsed content into chunks
embed() - Generate embeddings for chunks
reset_to_state(state) - Reset resource to specific processing state
lock_resource() - Lock resource during processing
unlock_resource() - Unlock resource after processing

Fields:

state - Processing state (draft/retrieved/parsed/chunked/ready)
resource_type - Source type (url/attachment/text)
content - Raw retrieved content
markdown_content - Parsed markdown content
collection_ids - Associated collections

llm.knowledge.chunk

Responsibility: Represents individual text chunks with embeddings and metadata for vector search.

Key Methods:

generate_embedding() - Generate and store embedding for chunk
search(query, limit, collection_id) - Semantic search (overridden search method)

Fields:

content - Chunk text content
chunk_index - Position in original document
vector_id - ID in vector store
embedding_model_id - Model used for embedding
token_count - Number of tokens in chunk
similarity - Search similarity score (computed)

llm.resource.parser

Responsibility: Converts various document formats (PDF, HTML, DOCX) to clean markdown.

Key Methods:

parse_to_markdown(content, content_type) - Main parsing entry point
parse_pdf(pdf_bytes) - Extract and parse PDF content
parse_html(html_content) - Convert HTML to markdown
clean_markdown(markdown) - Clean up markdown formatting

llm.knowledge.retriever

Responsibility: Handles semantic search and context retrieval for RAG operations.

Key Methods:

search_knowledge(query, collection_ids, limit, min_score) - Search across collections
get_relevant_context(query, max_tokens) - Get context for RAG generation
rank_results(results) - Re-rank search results

Optional Extensions

Image Parsing with Mistral OCR

Install llm_knowledge_mistral to extract text from images, receipts, and handwritten notes:

odoo-bin -d your_database -i llm_knowledge_mistral

Capabilities:

Handwritten notes
Receipts and invoices
Screenshots
Scanned documents
Product labels

Automated Knowledge Sync

Install llm_knowledge_automation for automatic document updates:

odoo-bin -d your_database -i llm_knowledge_automation

Alternative Vector Stores

llm_pgvector - PostgreSQL with pgvector extension (SQL-based)
llm_chroma - Lightweight embedded vector database
llm_qdrant - High-performance vector search engine

Documentation

Full Documentation: doc/index.rst
GitHub Repository: https://github.com/apexive/odoo-llm
Module Manifest: __manifest__.py

License

This module is licensed under LGPL-3.

---

Please log in to comment on this module

The author can leave a single reply to each comment.
This section is meant to ask simple questions or leave a rating. Every report of a problem experienced while using the module should be addressed to the author directly (refer to the following point).
If you want to start a discussion with the author, please use the developer contact information. They can usually be found in the description.

LLM Knowledge

Your Company Knowledge Is Trapped

Endless Searching

Format Chaos

Wasted Hours

Information Silos

There's a Better Way

Cut 'Where Is That Document?' Time to Zero

What is RAG and Why Should You Care?

Transform How Your Team Works

Instant Expert Answers

Save Hours Daily

Always Up-to-Date

100% Data Privacy

See the Difference in Action

Employee Onboarding

Customer Support

Policy Questions

Simple as 1-2-3-4-5

Upload Documents

Auto Processing

Ask Questions

Get Answers

Keep Learning

Complete Setup Guide

Setup Vector Store

Install Vector Store Module

Configure Embedding Provider

Install Module: llm_openai

BONUS: Parse Images with AI Optional

Install Module: llm_knowledge_mistral

What Can You Parse?

How It Works:

Create Knowledge Collection

Key Settings:

Upload & Process Documents

Step 4a: Upload Documents

Step 4b: View Uploaded Resources

Step 4c: Processing Pipeline

Step 4d: View Chunks

Use with LLM Assistants

Install Module: llm_tool_knowledge

Example Queries:

Connect External Apps

Install Module: llm_mcp_server

What's Next?

Auto-Sync

Alternative Stores

Advanced Parsing

Enterprise-Ready Knowledge Management

Organized Collections

Intelligent Chunking

Semantic Search

Multiple Formats

Flexible Storage

Source Citations

Seamless Integration

AI Assistant Integration

Works with:

Provider Flexibility

Supported providers:

Transform Your AI Into a Company Expert

Documentation & Support

Complete Documentation

Open Source

LLM Knowledge

What is RAG?

Installation

Prerequisites

Install Steps

Quick Start (5 Minutes)

1. Create a Knowledge Collection

2. Add a Document

3. Search Your Knowledge

Usage Examples

Add Document from File Upload

Add Web Page Content

Batch Process Multiple Documents

Use with LLM Assistant

Screenshots Tutorial

Cut 'Where Is That Document?'
Time to Zero

Install Module: `llm_openai`

Install Module: `llm_knowledge_mistral`

Install Module: `llm_tool_knowledge`

Install Module: `llm_mcp_server`