Re: [support] How to parse files using Drupal 7, Search API and Tika?

9 Dec 2011


      First, your create text for index: extract text from files using any
mechanism(use a unique words filter, and small word 3 char, will
reduce the text content): tika, unix shell command, etc
second action is to attach the files text to node, to be indexed by
the drupal search index (cck or directly in body).
After that you can user Solr or sphinx or other extra index storage,
and I think a custom solution is betten than
a lot of installed modules.
Liviu.
On Fri, Dec 9, 2011 at 10:28 AM, Florian Auer lists@floeschie.org wrote:
...
Hi Liviu!
Am Donnerstag, 8. Dezember 2011, 20:06:06 schrieb Liviu Nicolicioiu:
...
What do you think if you use pdf2txt unix and put in a cck field?
I need to parse Microsoft Office and other file formats, too. Furthermore, we want to use a Solr search engine for perfomance reasons. The Search API integrates both Tika and Solr very well (if you know how to do it ;]).
--
Cheers!
Florian
[ Drupal support list | http://lists.drupal.org/ ]

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [support] How to parse files using Drupal 7, Search API and Tika?

Florian