Issue status update for http://drupal.org/node/28159 Post a follow up: http://drupal.org/project/comments/add/28159 Project: Drupal Version: cvs Component: search.module Category: feature requests Priority: normal Assigned to: Steven Reported by: Steven Updated by: Bèr Kessels Status: patch (code needs review) * my update failed. You had/have a problem in update.inc in the updates/callback array. I had-modified the database and it works fine now. * I am not happy with the 5 character default limit. It took me quite some time to find out why my words were not indexed. But that should not hold back this patch. More somthing to have acloser look for in teh future. I think of intelligent auto-blackslisting or so. * When I do an advanced search, teh resutlts are primted in the box: very good!. But my advaced form is emty. That is no good udability, imo. That form should represent what I was searching for. * I still do not like he way results are returned by default: "weblink - Bèr Kessels - 05/08/2005 - 14:20 - 0 comments" is far too much data. And i even have "do not show data for foo", set in the theme settings! can we please re-thing the *default* styles for search results? Without all the CSS bloat, and without all the details? Why not go for a default teaser vew? $node->title = $title //the name, subject or title of teh element $node->teaser = $content // teh nice highlighted data of search result body That way it wil be consisten tih teh rest of the site, save a *lot* of code and be better theamble too. * "your serach yielded no results" should be a set_message. Yes, we discussed that it should be in the place where ppl look for the results. but I tried it, and it is just as visible above the search as below! really. (see screenie) Overall, i think introducing a complex search is very nice. And i think this is a great step forward. But I much rather see, this being implemented in a MUCH easier API as well as MUCH easier hooks. preprocessing as a single hook. What /is/ preprocessing? form, index, etc in one hook? why put preprocess ii a single hook, but the others on one huge nodapi alike construction? What do they do, these nodeapi things? Why do I need them and when do I need them? I say this, because I spent hours with and hours trial-and-error methods to get some form of advanced flexinode search going. The current system is just to hard to grok for an average developer like me. I know these things are easy to say, but very hard to implement. But as long as we cannot allow searching for the obvious data people see (I am sure I read that that search guru has winamp as favorite, why can't i find his username when i search for winamp) I think we should be carefull with extending the search into advanced search. Can we not first think of a general solution, one that will fix ALL drupals search problems and then dive into advanced searching, that will make it even more complex? Steven: I still think you did a marvelous job! Bèr Kessels Previous comments: ------------------------------------------------------------------------ Thu, 04 Aug 2005 00:46:49 +0000 : Steven Attachment: http://drupal.org/files/issues/search_2.patch (37.05 KB) Here's my promised search patch. It's not 100% commit ready yet, but it's time to sollicit some feedback and get this tested ;). Note that this patch requires a db update, which will wipe the search index. You will then need to call cron.php enough times for the site to be indexed completely again. This could take a while for large databases, but you can control the throttle and see the progress at admin/settings/search. *Features* * AND keyword matching by default ('all of the words'), instead of OR ('any of the words'). * OR support through keyword1 OR keyword2 OR ... * Phrase searching through "quoted strings". * Negative matching through -"minus prefix" -word. * Restrict search by taxonomy or node type(s) using taxonomy:1,2 and type:blog,page. The options are built-into the keyword string through a google like syntax, but there is an expandable "advanced settings" form below the search box which acts as a 'query builder': This example will result in the following search string (of course not a practical example): test type:forum,story category:1 "tinky winky" OR "dipsy" -"uh oh" "teletubby bye bye" On a different note, I removed the wildcard matching. An important reason is that there were significant performance problems with leading wildcards. Such queries were not be able to use any indices, and the resulting full-table scan took a long time. Even Google does not have intra-word wildcards, theirs can only be used as placeholders for entire words in phrases. Trailing wildcards on the other hand are usually used to accomodate grammatical variations on a word. But, wildcards are not really the best tool for this as this puts a burden on the user. If you need this feature, you should instead tie in an algorithm like the Porter Stemmer through the search_preprocess hook. That way you can reduce related words to a single common root (e.g. "walker" "walking" "walked" to "walk"). The search system will then index and search on the reduced words. You will even benefit from a reduced database size because there are less unique words. Because such algorithms are very language specific, I didn't build in any. But it should be trivial to make a Porter Stemmer module for Drupal search, which can be used on english sites. *Database* To implement the above searches, I added a 'search_dataset' table that is independent of the keyword index. Each dataset row contains the entire contents of the indexed item, but filtered, cleaned up and reduced to space-sparated tokens (words, numbers, dates, ...). This table is used to resolve the exact conditions, which means the keyword index is not as essential anymore. Because searches are AND by default, the OR method of search_index acts as an initial filter to eliminate the majority of items immediately. That subset is then further reduced through the search_dataset table. All of this means that the search_index table can now be indexed at a much higher minimum word lenght (e.g. 5), which means a reduced database size. Even with the new dataset table, the net database size shrinks slightly. I also implemented the searching as two selects into temporary tables. This allows me to avoid doing a costly counting query for the pager and a range-limited query for the actual results. I added support for temporary tables to database.(my|pg)sql. The db api itself takes a normal SELECT and a table name, and turns it into an appropriate platform specific temporary table query (CREATE TABLE ... AS, CREATE TABLE ... SELECT). I still need to do detailed benchmarking, but at least for the same queries as before, this patch should be faster. Of course, pre-patch, all searches were OR, not AND, so a direct comparison needs to take this into account (the pre-patch query "drupal theme development" is now "drupal OR theme OR development"). One feature request that I did not do is date based searching (before X/X/X, after X/X/X), mostly because we don't have a good date widget yet. I've been toying with making a simple in-page JS data picker, but it's not done yet and I think the patch is good enough already. Date restrictions can be added on later without any problems. ------------------------------------------------------------------------ Thu, 04 Aug 2005 01:27:29 +0000 : Steven Oh and in case this wasn't clear, the syntax of putting extra conditions into the search keywords ("type:blog") means that each search result page can be linked to directly. They all have clean URLs: search/node/type:blog+keyword for example. ------------------------------------------------------------------------ Thu, 04 Aug 2005 02:34:38 +0000 : Steven Attachment: http://drupal.org/files/issues/search_3.patch (37.05 KB) Sorry, the patch was malformed because wincvs wrapped those really long preg classes :P. Fixed patch attached. ------------------------------------------------------------------------ Thu, 04 Aug 2005 13:28:14 +0000 : stevryn This looks great, cant wait till its fully ready. I tried trip_search, but couldnt get it to work, and the regular search definately needed some better features! I would like to test it, but I have no idea how to apply a patch. Can you give me simple, for a Unix dummy, instructions on how to go about it? Tx T ------------------------------------------------------------------------ Thu, 04 Aug 2005 14:23:08 +0000 : webchick
Can you give me simple, for a Unix dummy, instructions on how to go about it?
I can help you there, I think. Follow step 2 here if you don't already have a CVS version of Drupal up and running (you can't use this patch against 4.6.2, for example): http://www.planetsoc.com/node/164 Then, switch to your Drupal CVS root directory, for example: cd ~/drupal-cvs Use wget to retrieve a copy of the most recent patch (in this case, search_3.patch): wget http://drupal.org/files/issues/search_3.patch Execute the following command to apply the patch to your Drupal installation: patch -p0 -u < search_3.patch This will patch all the files with the updated search. Then go through the normal steps you would go through to get a new Drupal system up and running. Step 3 of the aforementioned link has some info on how to get a table prefix going if you want to keep this test version separate from your "normal" Drupal installation. My problem is I've done all of this, but am still getting strange errors (even on a "normal" unpatched version of the search), so I need to figure out if I have a problem on my end or what's going on. ------------------------------------------------------------------------ Thu, 04 Aug 2005 14:27:52 +0000 : killes@www.drop.org @Jeremy: Sorting by two fields does not seem to work. @Moshe: This code does not rely on the fact that wid is an auto_increment field in any way. Just some concerns did. ------------------------------------------------------------------------ Thu, 04 Aug 2005 14:29:28 +0000 : killes@www.drop.org Oops, that comment should have been for another issue. ------------------------------------------------------------------------ Thu, 04 Aug 2005 15:49:36 +0000 : matt_paz It would be nice to allow the ability to select which vocabularies and node types are (or aren't) available in the advanced search. Or to be able to turn them off altogether. It would also be nice to be able to display the totla node count for each type/category in parens. ------------------------------------------------------------------------ Thu, 04 Aug 2005 15:50:11 +0000 : matt_paz Nice addition! It seems to be working great. It would be nice to allow the ability to select which vocabularies and node types are (or aren't) available in the advanced search. Or to be able to turn them off altogether. It would also be nice to be able to display the totla node count for each type/category in parens. ------------------------------------------------------------------------ Thu, 04 Aug 2005 17:20:48 +0000 : stevryn Thanks webchick! I have it working now. Great work Steven, my live site is 4.6.1, I havent wanted to take the great leap and update, last time I did it was *not* pretty. I assume this will work with that version once its completed and submitted? Seriously the search functionality needs this sort of advanced features!! Tx for all your work! ------------------------------------------------------------------------ Fri, 05 Aug 2005 06:19:51 +0000 : Kobus Hi! Like the previous replier to this post, I can't apply patches myself, but simply because I don't do it frequently enough, and forgot how to do it, but I will catch up with this and test if I get a chance. In principle this is a great patch! Definate +1. I have another feature (not sure it belongs to this thread, so if not, I apologize) that I would love to see in the search module. That is to provide hooks so that you can create a customized search form, for example, I need the following three field sets in a search form for a property website: * Price range "Start price" -> "End price", using some MIN() and MAX() functions if the user selects the wrong way around. (dropdowns with certain ranges). * Area where the property is supposed to be located in (taxonomy category). * Features that the property MUST have, for example, your requirement would be "TWO BATHROOMS" or "FOUR BEDROOMS". (Search through checkboxes, radios and text fields that was defined in the "property.module" file and database structure. (This module is written by an amateur (me), that's why not contributed, but, should there be interest in it, I will contribute it.) These extra search forms would be way different for each different application, so I don't expect the search module to be able to actually do this, but at least to provide functionality that a coder can write such an extension in his module, in other words, the queries that will define the search, and the form that the user will see, should be definable in the module, and called instead of the default search form if required. Will this be possible at all? Regards, Kobus ------------------------------------------------------------------------ Fri, 05 Aug 2005 06:42:59 +0000 : lgarfiel @Kobus: Actually you can do that now. See the Location module for an example of a fully custom search function. ------------------------------------------------------------------------ Fri, 05 Aug 2005 06:59:25 +0000 : Kobus Hi! Thanks lgarfiel. I have downloaded the module and will play around with it over the weekend. Regards, Kobus