[development] Performance Testing with a Big Test Database

Michael Haggerty mhaggerty-lists at trellon.com
Thu May 15 16:10:30 UTC 2008


For the database we are using to benchmark Xapian, we have indexed the  
database with both Drupal and Xapian. The size of the mysql database  
we used in testing stands at 1.1 GB and has about 100k records.

Within the mysql database, the size of the search index table is 451  
MB. The size of the Xapian database (which is some sort of a flat file  
format) is 569 MB. So Xapian is a little bigger in terms of disk  
space, but nowhere near four times as large.

It should be noted that Xapain is a little more exact than Drupal's  
search, and regularly returns more records on term searches. This is  
because it supports things like stemming and logical operators. There  
is an indication of total results in the benchmarks I put up on  
trellon.com the other day.

M

On May 15, 2008, at 11:03 AM, Larry Garfield wrote:
>
> What are the disk space requirements for Xapian?  At least in my  
> experience, the giant size of the index is more of an issue than  
> runtime.  The indexes are easily larger than the content in  
> question, by a factor of four.  (I'm about to disable search on one  
> site to avoid getting the web host mad at me for database size.)
>
> --Larry Garfield
>
> On Thu, 15 May 2008 09:07:07 -0400, Doug Green <douggreen at douggreenconsulting.com 
> > wrote:
>> AFAIKT, Xapian replaces the indexing.  I looked at the code when  
>> because
>> of this post to the devel list.  To use Xapian, you have to patch  
>> core.
>> We'd like to make this sort of thing easier.  We discussed it some at
>> the sprint.  I believe that Earnest Berry started refactoring code  
>> into
>> a search module, an indexing module and a UI module where the  
>> indexing
>> or UI module's could be replaced.
>>
>> We need more indexing performance tuning, but on the plane ride  
>> home I
>> came up with 3 small improvements (257910, 257912, 257916)
>>
>> Earnie Boyd wrote:
>>> Quoting Simon Lindsay <simon at iseek.biz>:
>>>
>>>> Doug Green wrote:
>>>>> One product of the search sprint is this large database for  
>>>>> testing...
>>>>> http://civicactions.s3.amazonaws.com/drupal6_100k.mysql.gz
>>>>
>>>> Hello Doug,
>>>>
>>>> You may not have seen it, but Trellon recently sponsored furthering
>>>> some development which we did with the Xapian search engine, and
>>>> integrating it in to Drupal.
>>>>
>>>> http://drupal.org/project/xapian
>>>>
>>>> Michael has done some preliminary performance testing, with almost
>>>> 100,000 records created with devel module, and posted the results  
>>>> here.
>>>>
>>>> http://www.trellon.com/blog/xapian-search-drupal
>>>>
>>>> Perhaps this may also be of interest for people looking into the
>>>> drupal search engine.
>>>>
>>>
>>> Is this xapian module creating the search indexes as well, or just  
>>> the
>>> UI functional pieces?  While the UI is important, the actual parsing
>>> of the node data to search is in need of some performance tuning as
>> well.
>>>
>>> Earnie -- http://for-my-kids.com/
>>> -- http://give-me-an-offer.com/
>>>
>>>
>>
>>
>> --
>> Doug Green
>> douggreen at douggreenconsulting.com
>> 904-583-3342
>>
>> Bringing Ideas to Life with Software Artistry and Invention...
>



More information about the development mailing list