[support] How to parse files using Drupal 7, Search API and Tika?

Fri Dec 9 09:46:09 UTC 2011

First, your create text for index: extract text from files using any
mechanism(use a unique words filter, and small word 3 char, will
reduce the text content): tika, unix shell command, etc
second action is to attach the files text to node, to be indexed by
the drupal search index (cck or directly in body).
After that you can user Solr or sphinx or other extra index storage,
and I think a custom solution is betten than
a lot of installed modules.

Liviu.

On Fri, Dec 9, 2011 at 10:28 AM, Florian Auer <lists at floeschie.org> wrote:
> Hi Liviu!
>
> Am Donnerstag, 8. Dezember 2011, 20:06:06 schrieb Liviu Nicolicioiu:
>> What do you think if you use pdf2txt unix and put in a cck field?
>
> I need to parse Microsoft Office and other file formats, too. Furthermore, we want to use a Solr search engine for perfomance reasons. The Search API integrates both Tika and Solr very well (if you know how to do it ;]).
>
> --
> Cheers!
>
> Florian
> --
> [ Drupal support list | http://lists.drupal.org/ ]