[support] How to parse files using Drupal 7, Search API and Tika?

Florian Auer lists at floeschie.org
Wed Dec 7 19:19:58 UTC 2011


Hey guys,

I'm trying to make documents searchable using Search API [1], Search API Attachments [2], Search API DB [3] and a local Tika [4] installation. I want to save the index data in the database for now, there is a Solr server ready for later integration.

I downloaded the Tika 1.0 runnable JAR file and successfully parsed a PDF and a DOC file from the command line using "java -jar /path/to/tika-1.0.jar --text" as user www-data.

I can index regular nodes and File nodes, but it doesn't parse the file's contents when I execute cron or trigger indexing manually. To me it seems as if Tika is never executed at all...

The author of the Search API Attachments module sais the module is "based on Apache Solr attachments", but it's not marked as required in the info file. So I'm assuming by "based on" he means "I borrowed some code"...

Is there someone who successfully got Drupal 7, Search API and Tika working together?

Any hints appreciated!

-- 
Cheers,

Florian


[1] http://drupal.org/project/search_api
[2] http://drupal.org/project/search_api_attachments
[3] http://drupal.org/project/search_api_db
[4] http://tika.apache.org/download.html

System: Debian Squeeze 64 bit


More information about the support mailing list