This extension implements a full text analysis and OCR (optical character recognition) of documents (currently PDF, DOC, DOCX) and generates keywords for search. It generates a thumbnail of the first document page, too.
It uses an external webservice, thus works without any additional software packages being installed on your hosting server. Ghostscript is not required.
Our webservice receives your document files, extracts all text it can find and generates a list of keywords.
This process is secure and private – we do not store or collect any information about the indexed contents.
- PDF and Microsoft Word to text conversion
- Search text in scanned PDF documents
- Generate first page thumbnails
- Extracts keywords from Images, Figures and Diagrams
WP-Filebase Pro comes with a built-in indexing mechanism using your server resources. Install this extension in these cases:
- The search results are not accurate, search does not return all documents
- Ghostscript is not available on your hosting server or it is at version 9.04 or earlier
- You want text within images to be indexed (e.g. scanned documents)
- PDF thumbnails do not display correctly
At our demo, you can see how the WordPress Search finds the document “File and Category Management”, which contains the words stylesheet and ftp. Consider uploading your own PDF file for testing and use the WordPress search (note that scanning can take up to 5 minutes).
If you find a document that is not properly indexed, please open a ticket with the document file attached.