Document Indexing using external tools

by
Odoo

112.76

v 11.0 v 12.0 Third Party 1
Technical Name document-indexing
LicenseSee License tab
Also available in version v 12.0
You bought this module and need support? Click here!
Technical Name document-indexing
LicenseSee License tab
Also available in version v 12.0

Better document indexing using external tools

This module replaces the default attachment indexing logic using calls to external tools. This enables Odoo to index many more file types for fulltext search, moreover the indexing of .pdf files is far more reliable.

In order to use this module, you have to install libreoffice and poppler-utils to your operating system first.

Make sure you can run "soffice" and "pdftotext" from command line.

The module indexes these document types:

  • .doc,
  • .docx,
  • .pdf,
  • .xls,
  • .xlsx,
  • .odp,
  • .ods,
  • .odt,
  • .wps (MS Works),
  • .rtf,
  • .ppt,
  • .pptx

Other types are passed to default processing by ir.attachment model.

The indexing works in two steps. First, the document is translated to .pdf using "soffice --headless" command, then the text is extracted from the .pdf by using "pdftotext" from poppler-utils.

Please take a look into /tmp directory from time to time, since there can still be some orphan files that you may want to delete manually. These files can remain in /tmp if the conversion to .pdf crashes for some reason.

Installation

How to install under docker image with odoo11

  1. download and run odoo image
  2. docker exec -it -u root odoo11 /bin/bash
  3. apt-get update
  4. apt-get install libreoffice poppler-utils
  5. cd to Odoo addons path
  6. unpack the module
  7. in Odoo, update modules list and install the module
  8. done - test it

How to install to on-premise Odoo installation under linux

  1. apt-get install libreoffice poppler-utils
  2. cd to Odoo addons path
  3. unpack the module
  4. in Odoo, update modules list and install the module
  5. done - test it

How to install to on-premise Odoo installation under Windows

...no idea, just try to make sure that you have everything needed to be able to run "soffice" and "pdftotext" commands from command line.

Creadits: the icon is named Search and its author is Igé Maulana from the Noun Project

MIT License

Copyright (c) 2018 Jan B. Krejčí

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Please log in to comment on this module

  • The author can leave a single reply to each comment.
  • This section is meant to ask simple questions or leave a rating. Every report of a problem experienced while using the module should be addressed to the author directly (refer to the following point).
  • If you want to start a discussion with the author or have a question related to your purchase, please use the support page.
There are no ratings yet!
by
felipe vallejo
on 4/11/20, 10:14 AM Confirmed Purchase

I am sorry i just needed to use studio to be able to see in my database the field indexe content. And now that i am able to see it, it is clear that it works (actually does not work) as you showed in your video. Just one question: any comments on whether your app dows affects the speed of the program or the storage capacity required I will ask my odoo partner to install the app. I will buy it on monday. thank you

Re:
by
Jan B. Krejčí
on 4/12/20, 1:58 AM Author

Thanks for your reply. To answer your question: the indexing is performed just once when saving the attachment to database. During that, some command-line tools are executed, which indeed takes some computing power and memory for a short moment. Therefore, if you try to index a large amount of attachments at once, it can take some time, depending on the attachments count and size. And yes, some storage capacity in database is also consumed, corresponding to size of the text extracted from the attached document. The textual data is usually much smaller than the attachment itself. Regards


by
felipe vallejo
on 4/9/20, 10:48 AM Confirmed Purchase

I do not see any indexxed content in either old or new attachments. thiese are fields to be filled when create an attachment Name test Folder Products Tags File Content Carta a Proveedores COVID-19.pdf Owner LUIS FELIPE VALLEJO GONZALEZ Contact LUIS FELIPE VALLEJO GONZALEZ File Size 242,935 Type File Created on 04/09/2020 09:22:38 Created by LUIS FELIPE VALLEJO GONZALEZ Company PQUIM SAS. Mime Type application/pdf

Re:
by
Jan B. Krejčí
on 4/10/20, 1:27 PM Author

I am not sure I understand your post. Could you please be more specific? Regards...


by
felipe vallejo
on 4/9/20, 10:10 AM Confirmed Purchase

thank you. I am the final user. I will send this video to my odoo partner in order to see if we can give it a try next week. Greetings


by
felipe vallejo
on 4/8/20, 5:48 PM Confirmed Purchase

would you mind to make a video for me on how it works with the pdf indexing. and just to clarify the apt-get install git libreoffice poppler-utils is on the server and not in just one computer. I just one to be sure that it is what i am looking for,

Re:
by
Jan B. Krejčí
on 4/9/20, 5:36 AM Author

https://1drv.ms/v/s!AjL5pWFelUm5g4kDvsmmCGgi4KLLQw?e=85zg0V


any possibilty to have it for v12.
by
felipe vallejo
on 4/4/20, 4:24 AM Confirmed Purchase
Re: any possibilty to have it for v12.
by
Jan B. Krejčí
on 4/8/20, 5:44 AM Author

Hi, I have just tested it and works fine with v12.


by
Pradip Rangwani
on 8/27/18, 4:21 AM

Can i get the demo link?