[support] How to parse files using Drupal 7, Search API and Tika?

Ted ted-drupalists at webfirst.com
Wed Dec 7 19:31:04 UTC 2011


I got this working by following the README, I believe. Do you have 
apache-tika-0.9-src.zip and tika-app-0.9.jar in 
sites/all/libraries/tika? If you have full control over your (assumed 
Linux) server, you can always run an strace on apache and grep for tika. 
That and xdebug with breakpoints in the appropriate lines in 
search_api_attachments/includes/callback_attachments_settings.inc (i.e. 
the shell_exec invocation) should get you there.

Ted

On 12/7/2011 2:19 PM, Florian Auer wrote:
> Hey guys,
>
> I'm trying to make documents searchable using Search API [1], Search API Attachments [2], Search API DB [3] and a local Tika [4] installation. I want to save the index data in the database for now, there is a Solr server ready for later integration.
>
> I downloaded the Tika 1.0 runnable JAR file and successfully parsed a PDF and a DOC file from the command line using "java -jar /path/to/tika-1.0.jar --text" as user www-data.
>
> I can index regular nodes and File nodes, but it doesn't parse the file's contents when I execute cron or trigger indexing manually. To me it seems as if Tika is never executed at all...
>
> The author of the Search API Attachments module sais the module is "based on Apache Solr attachments", but it's not marked as required in the info file. So I'm assuming by "based on" he means "I borrowed some code"...
>
> Is there someone who successfully got Drupal 7, Search API and Tika working together?
>
> Any hints appreciated!
>



More information about the support mailing list