[development] Drupal could be 10x times faster with correct indexing
Michael Haggerty
mhaggerty-lists at trellon.com
Wed Jul 9 01:28:08 UTC 2008
On Jun 30, 2008, at 10:05 AM, Jean-Michel Pouré wrote:
> In the above issue, I could speed-up a simple query by 10 times. You
> may
> not notice this issue when using small Drupal sites (because the whole
> database executes in shared memory) or when using Drupal with caching.
> But ... on large installations like mine, you cannot cache everything
> without a huge amount of memory. And the result is SLOW.
I have implemented a large scale Drupal site that uses PostGres for
the backend. We benchmarked it at around 4 million page views an hour
with memcache installed and using a cache injection strategy. We had
to perform extensive modifications to the database schema to get the
core database performing well by adding indexes and primary keys to
every table (which is extremely important when working in a
clustered / replicated / federated environment, and which large scale
sites also need to address).
Other people in this thread have commented on the lack of testing
around PostGres, and the original poster pointed out the lack of
proper indexing on contributed modules. I think these comments get to
the heart of the matter, that the problems with PostGres support in
Drupal is systemic and arises from an lack of resources within the
community.
On the other hand, proper PostGres tuning is far more important than
anything else you can do in terms of overall performance. There are a
number of kernel level tuning steps specific to PostGres servers which
should be taken in order to ensure maximum database throughput.
PostGres is just different enough from MySQL to throw people for a
serious loop if they do not know what they are getting into. We worked
with a PostGres consulting firm called Command Prompt and they were
truly excellent in helping us improve database performance, tweak the
servers, and identify which tables were getting hit the hardest so we
knew where to start with our enhancements to core.
We did do an EXPLAIN ANALYZE on every query in our installation (by
exposing each one with devel) and this included a large number of
contributed modules. The most problematic queries were the ones coming
from user_points and views, which should come as no surprise, and
there were a number of minor modifications we needed to make to other
modules to get the most out of them. At no point did we see anything
near a 10x improvement in query performance across the board, although
we did manage to get some queries running that were locking up our
servers under load.
This was a brute force tactic to improving system performance needed
to get the project done, and a better approach to resolving PostGres
issues within Drupal might be to use PostGres' query logger to get
snapshots of database behavior under load and address the biggest
outliers in terms of performance. While log files can quickly become
huge, taking 60 second snapshots and sorting them for query
performance time can be very instructive. Also, anyone dealing with
issues of load needs to be aware of the actual number of queries
running on each page, which is quite a lot even with most caching
enabled. Either Jeremy Andrews or Gerhard wrote a nice patch to
eliminate all queries from core for anonymous users, although I am not
sure how useful this is for users in general (the point is it can be
done).
The most significant problem we faced, outside of straight up bad
queries, was repetitive queries. The userpoints module was challenging
to work with and we did need to rewrite most of the data access there
to make it suitable for a production environment. There are other
modules we changed, but this one required the most effort. We had some
pages that were hitting the points tables over 100 times per page
load, even for anonymous users with memcache installed.
That said, for now, use MySQL if you can. : ) It is the swiss army
knife to PostGres' blazing scimitar, there are a lot more clustering
options available, it can perform most simple queries much faster than
PostGres (read ahead memory adds a lot of overhead to queries with
less than 3 joins compared to MySQL), the FOSS db admin tools
available for it are more sophisticated, and Drupal certainly likes it
more.
What's ironic about this discussion is the PostGres developer base is
growing at a faster rate than MySQL and adoption is increasing for a
number of other FOSS projects. At some point, I think demand for
PostGres implementations is going to grow to the point where the
systemic issues are going to have to go away for Drupal. These kinds
of problems are not solved quickly or easily, but hopefully it helps
knowing other people have faced the same challenges and been
successful at achieving performance goals.
M
More information about the development
mailing list