[development] Drupal could be 10x times faster with correct indexing

Michael Haggerty mhaggerty-lists at trellon.com
Wed Jul 9 01:28:08 UTC 2008


On Jun 30, 2008, at 10:05 AM, Jean-Michel Pouré wrote:
> In the above issue, I could speed-up a simple query by 10 times. You  
> may
> not notice this issue when using small Drupal sites (because the whole
> database executes in shared memory) or when using Drupal with caching.
> But ... on large installations like mine, you cannot cache everything
> without a huge amount of memory. And the result is SLOW.

I have implemented a large scale Drupal site that uses PostGres for  
the backend. We benchmarked it at around 4 million page views an hour  
with memcache installed and using a cache injection strategy. We had  
to perform extensive modifications to the database schema to get the  
core database performing well by adding indexes and primary keys to  
every table (which is extremely important when working in a  
clustered / replicated / federated environment, and which large scale  
sites also need to address).

Other people in this thread have commented on the lack of testing  
around PostGres, and the original poster pointed out the lack of  
proper indexing on contributed modules. I think these comments get to  
the heart of the matter, that the problems with PostGres support in  
Drupal is systemic and arises from an lack of resources within the  
community.

On the other hand, proper PostGres tuning is far more important than  
anything else you can do in terms of overall performance. There are a  
number of kernel level tuning steps specific to PostGres servers which  
should be taken in order to ensure maximum database throughput.  
PostGres is just different enough from MySQL to throw people for a  
serious loop if they do not know what they are getting into. We worked  
with a PostGres consulting firm called Command Prompt and they were  
truly excellent in helping us improve database performance, tweak the  
servers, and identify which tables were getting hit the hardest so we  
knew where to start with our enhancements to core.

We did do an EXPLAIN ANALYZE on every query in our installation (by  
exposing each one with devel) and this included a large number of  
contributed modules. The most problematic queries were the ones coming  
from user_points and views, which should come as no surprise, and  
there were a number of minor modifications we needed to make to other  
modules to get the most out of them. At no point did we see anything  
near a 10x improvement in query performance across the board, although  
we did manage to get some queries running that were locking up our  
servers under load.

This was a brute force tactic to improving system performance needed  
to get the project done, and a better approach to resolving PostGres  
issues within Drupal might be to use PostGres' query logger to get  
snapshots of database behavior under load and address the biggest  
outliers in terms of performance. While log files can quickly become  
huge, taking 60 second snapshots and sorting them for query  
performance time can be very instructive. Also, anyone dealing with  
issues of load needs to be aware of the actual number of queries  
running on each page, which is quite a lot even with most caching  
enabled. Either Jeremy Andrews or Gerhard wrote a nice patch to  
eliminate all queries from core for anonymous users, although I am not  
sure how useful this is for users in general (the point is it can be  
done).

The most significant problem we faced, outside of straight up bad  
queries, was repetitive queries. The userpoints module was challenging  
to work with and we did need to rewrite most of the data access there  
to make it suitable for a production environment. There are other  
modules we changed, but this one required the most effort. We had some  
pages that were hitting the points tables over 100 times per page  
load, even for anonymous users with memcache installed.

That said, for now, use MySQL if you can. : ) It is the swiss army  
knife to PostGres' blazing scimitar, there are a lot more clustering  
options available, it can perform most simple queries much faster than  
PostGres (read ahead memory adds a lot of overhead to queries with  
less than 3 joins compared to MySQL), the FOSS db admin tools  
available for it are more sophisticated, and Drupal certainly likes it  
more.

What's ironic about this discussion is the PostGres developer base is  
growing at a faster rate than MySQL and adoption is increasing for a  
number of other FOSS projects. At some point, I think demand for  
PostGres implementations is going to grow to the point where the  
systemic issues are going to have to go away for Drupal. These kinds  
of problems are not solved quickly or easily, but hopefully it helps  
knowing other people have faced the same challenges and been  
successful at achieving performance goals.

M




More information about the development mailing list