PDF, DOCX Keyword Indexing not working

In some cases, keyword indexing of PDF, DOCX or ODT need a special server configuration in order to work.


Before you start worrying about keyword indexing, you should take a look at the File Info box below the form when editing a file. It shows the keywords that will be used for searches. If you don’t find a list of keywords here this article gives some ingsights about requirements and troubleshooting.

PDF Requirements

PDF indexing relies on Ghostscript. This is a commandline tool (gs) that need to be installed on the server. The minimum version required is 9.05. Any older version before 9.05 does not support text extraction of PDF-files. WP-Filebase will let you know if this is the case (notice on the settings page). If you are unsure about gs availability or how to install it, ask your hosting provider. It might be necessary to run PHP in CGI-mode in order to run external programs.

Apart from that, WP-Filebase Pro also includes an experimental text extraction method, that does work even without Ghostscript. It tries to find GZIP-Compressed text blocks in the PDF file and extracts them.


Indexing of Microsoft Word .docx and Openc Office .odt documents requires the PHP unzip_file-function to be available. If you are experiencing a lack of keywords, ask your hosting provider to enable the PHP ZIP functions..


VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

One thought on “PDF, DOCX Keyword Indexing not working

  1. Steffen says:
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)

    I think I figured out why full-text search is not working in my PDFs.
    In your sandbox demo I can see the whole contents in the “ID3 Tags” field.
    In my “ID3 Tags” fields I see only these errors (below the list of availabe templates):

    “Keywords used for search: pdf application num_pages pagefailed_1 pagefailed_2 pagefailed_3 pagefailed_4 pagefailed_5 pagefailed_6 pagefailed_7 pagefailed_8 pagefailed_9 pagefailed_10 pagefailed_11 pagefailed_12 pagefailed_13 pagefailed_14 pagefailed_15 pagefailed_16 pagefailed_17 pagefailed_18 pagefailed_19 pagefailed_20 pagefailed_21 pagefailed_22 pagefailed_23 pagefailed_24 pagefailed_25 pagefailed_26 pagefailed_27 pagefailed_28 pagefailed_29 pagefailed_30 pagefailed_31 pagefailed_32 pagefailed_33 pagefailed_34 pagefailed_35 pagefailed_36 pagefailed_37 pagefailed_38 pagefailed_39 pagefailed_40 pagefailed_41 pagefailed_42 pagefailed_43 pagefailed_44 pagefailed_45 pagefailed_46 pagefailed_47 pagefailed_48 pagefailed_49 pagefailed_50 pagefailed_51 pagefailed_52 pagefailed_53 pagefailed_54 pagefailed_55 pagefailed_56 pagefailed_57 pagefailed_58 pagefailed_59 pagefailed_60 pagefailed_61 pagefailed_62 pagefailed_63 pagefailed_64 pagefailed_65 pagefailed_66 pagefailed_67 pagefailed_68 pagefailed_69 pagefailed_70 pagefailed_71 pagefailed_72 pagefailed_73 pagefailed_74 pagefailed_75 pagefailed_76 pagefailed_77 pagefailed_78 pagefailed_79 pagefailed_80 pagefailed_81 pagefailed_82 pagefailed_83 pagefailed_84 pagefailed_85 pagefailed_86 pagefailed_87 pagefailed_88 pagefailed_89 pagefailed_90 pagefailed_91 pagefailed_92 pagefailed_93 pagefailed_94 pagefailed_95 pagefailed_96 pagefailed_97 pagefailed_98 pagefailed_99 pagefailed_100 analyze pdf_2text timings ”

    Does “analyze pdf_2text timings” want to tell me anything?

Leave a Reply

Your email address will not be published. Required fields are marked *