On Jun 30, 2008, at 10:05 AM, Jean-Michel Pouré wrote:
In the above issue, I could speed-up a simple query by 10 times. You may not notice this issue when using small Drupal sites (because the whole database executes in shared memory) or when using Drupal with caching. But ... on large installations like mine, you cannot cache everything without a huge amount of memory. And the result is SLOW.
I have implemented a large scale Drupal site that uses PostGres for the backend. We benchmarked it at around 4 million page views an hour with memcache installed and using a cache injection strategy. We had to perform extensive modifications to the database schema to get the core database performing well by adding indexes and primary keys to every table (which is extremely important when working in a clustered / replicated / federated environment, and which large scale sites also need to address). Other people in this thread have commented on the lack of testing around PostGres, and the original poster pointed out the lack of proper indexing on contributed modules. I think these comments get to the heart of the matter, that the problems with PostGres support in Drupal is systemic and arises from an lack of resources within the community. On the other hand, proper PostGres tuning is far more important than anything else you can do in terms of overall performance. There are a number of kernel level tuning steps specific to PostGres servers which should be taken in order to ensure maximum database throughput. PostGres is just different enough from MySQL to throw people for a serious loop if they do not know what they are getting into. We worked with a PostGres consulting firm called Command Prompt and they were truly excellent in helping us improve database performance, tweak the servers, and identify which tables were getting hit the hardest so we knew where to start with our enhancements to core. We did do an EXPLAIN ANALYZE on every query in our installation (by exposing each one with devel) and this included a large number of contributed modules. The most problematic queries were the ones coming from user_points and views, which should come as no surprise, and there were a number of minor modifications we needed to make to other modules to get the most out of them. At no point did we see anything near a 10x improvement in query performance across the board, although we did manage to get some queries running that were locking up our servers under load. This was a brute force tactic to improving system performance needed to get the project done, and a better approach to resolving PostGres issues within Drupal might be to use PostGres' query logger to get snapshots of database behavior under load and address the biggest outliers in terms of performance. While log files can quickly become huge, taking 60 second snapshots and sorting them for query performance time can be very instructive. Also, anyone dealing with issues of load needs to be aware of the actual number of queries running on each page, which is quite a lot even with most caching enabled. Either Jeremy Andrews or Gerhard wrote a nice patch to eliminate all queries from core for anonymous users, although I am not sure how useful this is for users in general (the point is it can be done). The most significant problem we faced, outside of straight up bad queries, was repetitive queries. The userpoints module was challenging to work with and we did need to rewrite most of the data access there to make it suitable for a production environment. There are other modules we changed, but this one required the most effort. We had some pages that were hitting the points tables over 100 times per page load, even for anonymous users with memcache installed. That said, for now, use MySQL if you can. : ) It is the swiss army knife to PostGres' blazing scimitar, there are a lot more clustering options available, it can perform most simple queries much faster than PostGres (read ahead memory adds a lot of overhead to queries with less than 3 joins compared to MySQL), the FOSS db admin tools available for it are more sophisticated, and Drupal certainly likes it more. What's ironic about this discussion is the PostGres developer base is growing at a faster rate than MySQL and adoption is increasing for a number of other FOSS projects. At some point, I think demand for PostGres implementations is going to grow to the point where the systemic issues are going to have to go away for Drupal. These kinds of problems are not solved quickly or easily, but hopefully it helps knowing other people have faced the same challenges and been successful at achieving performance goals. M