Tika parseToString returns empty string when calling with a word-document

If you get an empty string from the parseToString method (and you don't get an exception) it is usually the lack of the tika-parsers-0.9.jar file.

As soon as you add it (including it's dependencies poi-3.7, poi-ooxml-3.7, poi-ooxml-schmeas-3.7, dom4j-1.6.1, geronimo-stax-api_1.0, xmlbeans-2.3, poi-scratchpad and commons-compress) to the classpath you will get the text.

The ugly part about this problem is that your program will always run just fine when you only reference tika-core-0.9.jar. But when you try to parse documents it will always instanciate an EmptyParser (because there is no parser supporting your mime-type ;D). The EmptyParser will return an empty string but there won't be an exception.

Leave a comment

Your email address will not be published. Required fields are marked *