What do you think if you use pdf2txt unix and put in a cck field?
On Thu, Dec 8, 2011 at 6:15 PM, Florian Auer lists@floeschie.org wrote:
Am Mittwoch, 7. Dezember 2011, 20:19:58 schrieb Florian Auer:
Is there someone who successfully got Drupal 7, Search API and Tika working together?
Finally I got it working. The documentation is somewhat incomplete, so here's what i did to get Drupal 7, Search API and Tika running on Debian Squeeze:
== 1. Download Tika source archive == Go to [1] and copy the link URL to the archive on your favourite mirror and download it using wget:
$ wget [URL]
== 2. Extract Tika source archive to /opt == $ cd /opt # unzip /path/to/apache-tika-X.Y-src.zip
== 3. Install maven2 package == # apt-get install maven2
== 4. Compile Tika using Maven == $ cd tika-X.Y # MAVEN_OPTS=-Xmx256m mvn clean install
(This might take a while…)
== 5. Download and enable the required Drupal modules == drush dl search_api search_api_attachments search_api_db drush en search_api search_api_attachments search_api_db
== 6. Configure Drupal to use Tika ==
- Login to Drupal admin backend
- Open Search API settings
- Create a new server (Database)
- Create a new index or use existing one
- In your index settings, switch to "Workflow" tab
- In "Data alterations" area enable "File attachments"
- Got to "Fields" tab
- Enable "File content" field for indexing
== 7. Edit Search API attachment module == Note: This is only needed if you use version 7.x-1.0, should be already fixed in newer versions (see patch 3048482a89a1a587feab78f2d5ea92c4b5642898 on [2])
- Go to the module's directory (if you used drush, this should be DRUPAL_HOME/sites/all/modules/search_api_attachments)
- Open file include/callback_attachments_settings.inc in your favourite editor
- Replace any occurences of "entity_type" by "item_type" (see issue on [3])
== 8. Verify Tika is working and called by Drupal ==
- Open file include/callback_attachments_settings.inc again
- Add the following PHP code at the end of the file, right before the last return command (line 141-ish)
syslog(LOG_INFO, 'Calling Tika: ' . $cmd);
- Save and close the file
- Tail your syslog (# tail -f /var/log/syslog)
- Got to Search API settings in Drupal backend
- Re-index your site
- You should see some messages telling you the Tika command and the file which is indexed
This is a rather quick'n'dirty documentation, but I don't have time for more and the git repo for Search AP attachments isn't working properly, so I cannot create patches right now. If you have any questions, let me know!
-- Regards,
Florian
[1] http://www.apache.org/dyn/closer.cgi/tika/apache-tika-1.0-src.zip [2] http://drupalcode.org/project/search_api_attachments.git/patch/3048482a89a1a... [3] http://drupal.org/node/1253824 -- [ Drupal support list | http://lists.drupal.org/ ]