[development] Xapian (was Performance Testing with a Big Test Database)

Doug Green douggreen at douggreenconsulting.com
Thu May 15 16:22:20 UTC 2008


just changing the title here because my original thread was about
performance testing with a big test database, and this has been hijacked
by Xapian discussion..

Michael Haggerty wrote:
> For the database we are using to benchmark Xapian, we have indexed the
> database with both Drupal and Xapian. The size of the mysql database
> we used in testing stands at 1.1 GB and has about 100k records.
>
> Within the mysql database, the size of the search index table is 451
> MB. The size of the Xapian database (which is some sort of a flat file
> format) is 569 MB. So Xapian is a little bigger in terms of disk
> space, but nowhere near four times as large.
>
> It should be noted that Xapain is a little more exact than Drupal's
> search, and regularly returns more records on term searches. This is
> because it supports things like stemming and logical operators. There
> is an indication of total results in the benchmarks I put up on
> trellon.com the other day.
>
> M
>
> On May 15, 2008, at 11:03 AM, Larry Garfield wrote:
>>
>> What are the disk space requirements for Xapian?  At least in my
>> experience, the giant size of the index is more of an issue than
>> runtime.  The indexes are easily larger than the content in question,
>> by a factor of four.  (I'm about to disable search on one site to
>> avoid getting the web host mad at me for database size.)
>>
>> --Larry Garfield
>>
>> On Thu, 15 May 2008 09:07:07 -0400, Doug Green
>> <douggreen at douggreenconsulting.com> wrote:
>>> AFAIKT, Xapian replaces the indexing.  I looked at the code when
>>> because
>>> of this post to the devel list.  To use Xapian, you have to patch core.
>>> We'd like to make this sort of thing easier.  We discussed it some at
>>> the sprint.  I believe that Earnest Berry started refactoring code into
>>> a search module, an indexing module and a UI module where the indexing
>>> or UI module's could be replaced.
>>>
>>> We need more indexing performance tuning, but on the plane ride home I
>>> came up with 3 small improvements (257910, 257912, 257916)
>>>
>>> Earnie Boyd wrote:
>>>> Quoting Simon Lindsay <simon at iseek.biz>:
>>>>
>>>>> Doug Green wrote:
>>>>>> One product of the search sprint is this large database for
>>>>>> testing...
>>>>>> http://civicactions.s3.amazonaws.com/drupal6_100k.mysql.gz
>>>>>
>>>>> Hello Doug,
>>>>>
>>>>> You may not have seen it, but Trellon recently sponsored furthering
>>>>> some development which we did with the Xapian search engine, and
>>>>> integrating it in to Drupal.
>>>>>
>>>>> http://drupal.org/project/xapian
>>>>>
>>>>> Michael has done some preliminary performance testing, with almost
>>>>> 100,000 records created with devel module, and posted the results
>>>>> here.
>>>>>
>>>>> http://www.trellon.com/blog/xapian-search-drupal
>>>>>
>>>>> Perhaps this may also be of interest for people looking into the
>>>>> drupal search engine.
>>>>>
>>>>
>>>> Is this xapian module creating the search indexes as well, or just the
>>>> UI functional pieces?  While the UI is important, the actual parsing
>>>> of the node data to search is in need of some performance tuning as
>>> well.
>>>>
>>>> Earnie -- http://for-my-kids.com/
>>>> -- http://give-me-an-offer.com/
>>>>
>>>>
>>>
>>>
>>> -- 
>>> Doug Green
>>> douggreen at douggreenconsulting.com
>>> 904-583-3342
>>>
>>> Bringing Ideas to Life with Software Artistry and Invention...
>>
>
>


-- 
Doug Green
douggreen at douggreenconsulting.com
904-583-3342

Bringing Ideas to Life with Software Artistry and Invention...



More information about the development mailing list