just changing the title here because my original thread was about performance testing with a big test database, and this has been hijacked by Xapian discussion.. Michael Haggerty wrote:
For the database we are using to benchmark Xapian, we have indexed the database with both Drupal and Xapian. The size of the mysql database we used in testing stands at 1.1 GB and has about 100k records.
Within the mysql database, the size of the search index table is 451 MB. The size of the Xapian database (which is some sort of a flat file format) is 569 MB. So Xapian is a little bigger in terms of disk space, but nowhere near four times as large.
It should be noted that Xapain is a little more exact than Drupal's search, and regularly returns more records on term searches. This is because it supports things like stemming and logical operators. There is an indication of total results in the benchmarks I put up on trellon.com the other day.
M
On May 15, 2008, at 11:03 AM, Larry Garfield wrote:
What are the disk space requirements for Xapian? At least in my experience, the giant size of the index is more of an issue than runtime. The indexes are easily larger than the content in question, by a factor of four. (I'm about to disable search on one site to avoid getting the web host mad at me for database size.)
--Larry Garfield
On Thu, 15 May 2008 09:07:07 -0400, Doug Green <douggreen@douggreenconsulting.com> wrote:
AFAIKT, Xapian replaces the indexing. I looked at the code when because of this post to the devel list. To use Xapian, you have to patch core. We'd like to make this sort of thing easier. We discussed it some at the sprint. I believe that Earnest Berry started refactoring code into a search module, an indexing module and a UI module where the indexing or UI module's could be replaced.
We need more indexing performance tuning, but on the plane ride home I came up with 3 small improvements (257910, 257912, 257916)
Earnie Boyd wrote:
Quoting Simon Lindsay <simon@iseek.biz>:
Doug Green wrote:
One product of the search sprint is this large database for testing... http://civicactions.s3.amazonaws.com/drupal6_100k.mysql.gz
Hello Doug,
You may not have seen it, but Trellon recently sponsored furthering some development which we did with the Xapian search engine, and integrating it in to Drupal.
http://drupal.org/project/xapian
Michael has done some preliminary performance testing, with almost 100,000 records created with devel module, and posted the results here.
http://www.trellon.com/blog/xapian-search-drupal
Perhaps this may also be of interest for people looking into the drupal search engine.
Is this xapian module creating the search indexes as well, or just the UI functional pieces? While the UI is important, the actual parsing of the node data to search is in need of some performance tuning as well.
Earnie -- http://for-my-kids.com/ -- http://give-me-an-offer.com/
-- Doug Green douggreen@douggreenconsulting.com 904-583-3342
Bringing Ideas to Life with Software Artistry and Invention...
-- Doug Green douggreen@douggreenconsulting.com 904-583-3342 Bringing Ideas to Life with Software Artistry and Invention...