Dear All I want to learn more on apache solr file attachments. My aim is to modify / add code in a way, that will suit my requirements.
I have been struggling for quite sometime
1. How to index a file 2. How to store minimal data during indexing and get minimal data during searching 3. How to get results in smaller chunks, such as if my query records count is 1000, I would like to receive only 100 results at a time, in next query 100 more and so on
Elaborating more on above points
*1. How to index a file * Today when a user uploads his/her document, I create a node programatically, attach the document to node and then Drupal indexes file in next cron run. When a user attaches a new document, I delete the old attached document, attach the new document programatically, then Drupal indexes file in next cron run. That's how today I am indexing a file.
I feel, the node creation is not required here, just to get the file indexed, I am creating a node. I want to know whats the best way to index a file (without creating a node).
*2. How to store minimal data during indexing and get minimal data during searching *If you see during a file indexing , lots of data are sent to apache solr server and those are getting stored. To be specific here, for my requirement I do not need link, snippet - string, snippet-array, fields array having 22 elements etc etc. I may need to just store UID of user associated with this file . How can I achieve it , or at least where can I get more information on this.
*3. How to get results in smaller chunks, such as if my query records count is 1000, I would like to receive only 100 results at a time, in next query 100 more and so on *Lets say a query returns 1000 records, probably I may need only first 100 records first time, when user clicks more link, I may need next 100 records so on. Not sure if such facility is there with apache solr attachment module.
So I really will be happy to get more information on above 3 points. If any book, or any links that explains details will be really quite handful to me.
Any information in this regard highly appreciated.
Thanks Austin
Dear All Looks whole list is silent in this regard (apache solr file attachment). If you can share whatever idea you have on below queries, it will help me to great extent.
Thanks Austin
On Tue, Oct 16, 2012 at 6:22 AM, Austin Einter austin.einter@gmail.comwrote:
Dear All I want to learn more on apache solr file attachments. My aim is to modify / add code in a way, that will suit my requirements.
I have been struggling for quite sometime
- How to index a file
- How to store minimal data during indexing and get minimal data during
searching 3. How to get results in smaller chunks, such as if my query records count is 1000, I would like to receive only 100 results at a time, in next query 100 more and so on
Elaborating more on above points
*1. How to index a file
Today when a user uploads his/her document, I create a node programatically, attach the document to node and then Drupal indexes file in next cron run. When a user attaches a new document, I delete the old attached document, attach the new document programatically, then Drupal indexes file in next cron run. That's how today I am indexing a file.
I feel, the node creation is not required here, just to get the file indexed, I am creating a node. I want to know whats the best way to index a file (without creating a node).
*2. How to store minimal data during indexing and get minimal data during searching *If you see during a file indexing , lots of data are sent to apache solr server and those are getting stored. To be specific here, for my requirement I do not need link, snippet - string, snippet-array, fields array having 22 elements etc etc. I may need to just store UID of user associated with this file . How can I achieve it , or at least where can I get more information on this.
*3. How to get results in smaller chunks, such as if my query records count is 1000, I would like to receive only 100 results at a time, in next query 100 more and so on *Lets say a query returns 1000 records, probably I may need only first 100 records first time, when user clicks more link, I may need next 100 records so on. Not sure if such facility is there with apache solr attachment module.
So I really will be happy to get more information on above 3 points. If any book, or any links that explains details will be really quite handful to me.
Any information in this regard highly appreciated.
Thanks Austin
Austin,
I do not have an answer for you but I can tell you that I have been struggling with fine tunning my solr results. Setting up and all was OK but I wish there was more information from the Drupal side about how effective are the different choices when fine tunning the results. I guess the next best thing would be read the Apache Solr docs.
Good luck and I will be glad to read your findings.
Nestor :-)
On Tue, Oct 16, 2012 at 8:58 PM, Austin Einter austin.einter@gmail.comwrote:
Dear All Looks whole list is silent in this regard (apache solr file attachment). If you can share whatever idea you have on below queries, it will help me to great extent.
Thanks Austin
On Tue, Oct 16, 2012 at 6:22 AM, Austin Einter austin.einter@gmail.comwrote:
Dear All I want to learn more on apache solr file attachments. My aim is to modify / add code in a way, that will suit my requirements.
I have been struggling for quite sometime
- How to index a file
- How to store minimal data during indexing and get minimal data during
searching 3. How to get results in smaller chunks, such as if my query records count is 1000, I would like to receive only 100 results at a time, in next query 100 more and so on
Elaborating more on above points
*1. How to index a file
Today when a user uploads his/her document, I create a node programatically, attach the document to node and then Drupal indexes file in next cron run. When a user attaches a new document, I delete the old attached document, attach the new document programatically, then Drupal indexes file in next cron run. That's how today I am indexing a file.
I feel, the node creation is not required here, just to get the file indexed, I am creating a node. I want to know whats the best way to index a file (without creating a node).
*2. How to store minimal data during indexing and get minimal data during searching *If you see during a file indexing , lots of data are sent to apache solr server and those are getting stored. To be specific here, for my requirement I do not need link, snippet - string, snippet-array, fields array having 22 elements etc etc. I may need to just store UID of user associated with this file . How can I achieve it , or at least where can I get more information on this.
*3. How to get results in smaller chunks, such as if my query records count is 1000, I would like to receive only 100 results at a time, in next query 100 more and so on *Lets say a query returns 1000 records, probably I may need only first 100 records first time, when user clicks more link, I may need next 100 records so on. Not sure if such facility is there with apache solr attachment module.
So I really will be happy to get more information on above 3 points. If any book, or any links that explains details will be really quite handful to me.
Any information in this regard highly appreciated.
Thanks Austin
-- [ Drupal support list | http://lists.drupal.org/ ]
We just installed Solr on two different sites. On one it is indexing the attachments, on the other it misses most. We have not totally tracked it down, but it seems to have something to do with the method of attachment.
Nancy
Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King, Jr.
Austin Einter wrote:
Looks whole list is silent in this regard (apache solr file attachment).
If you can share whatever idea you have on below queries, it will help me to great extent.
Thanks Cindy-Sue, Nestor, Nancy for your input.
There are two aspects here.
*1. Modifying/Tuning apache solr 2. Getting Drupal to work precisely what we need * I know, the first one I have reference books, and confident going forward I will be able to manage it or if not I will put my queries in apache user mailing list as pointed by Cindy-Sue.
But for 2nd option "Getting Drupal to work precisely what we need" , probably we need to work out or put the queries to Drupal experts.
On a overall note, I am able to index the attached document (.txt, .doc, .docx, .pdf all types that I need). But I am not liking the way I am doing it, as I create a node and attach the document programatically. Other option is created a node, per node attach huge number of document (say 50,000 documents), but not sure what all side impacts and performance issues will follow. I am doing my best to find a way to do the indexing of files without creating a node. Looks media module apachesolr_file module may help, but could not get it working.
And, if you go to next level say, in hook _apachesolr_process_results just print the results , you will there are huge informations associated with result and the most important information (file belongs to which user) is not available.
I believe we can achieve these things using hooks. My findings till today are 1. While sending the document to solr for indexing by using removeParm function in an appropriate hook, remove the informations we do not need. 2. While querying to get the files containing few words, that time probably in query alter hook, tweak the query in such a way that, the result will contain the precise informations we need.
Still a long way to go for me to get the things in order.
Best Regards Austin
On Thu, Oct 18, 2012 at 3:35 AM, Ms. Nancy Wichmann nan_wich@bellsouth.netwrote:
We just installed Solr on two different sites. On one it is indexing the attachments, on the other it misses most. We have not totally tracked it down, but it seems to have something to do with the method of attachment.
*Nancy* Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King, Jr.
Austin Einter wrote:
Looks whole list is silent in this regard (apache solr file attachment). If you can share whatever idea you have on below queries, it will help me to great extent.
-- [ Drupal support list | http://lists.drupal.org/ ]