Drupal could be 10x times faster with correct indexing
Dear friends, Today my Drupal 6.2 server was flooded and I discovered that there was a real lack of indexing in Drupal. Here is the issue with solutions: http://drupal.org/node/276742 When using MySQL, you don't really know how SQL queries are treated, except execution times that may be monitored (?). On the converse, PostgreSQL offers full logging of queries and access to query plans, showing how the database executes data. In the above issue, I could speed-up a simple query by 10 times. You may not notice this issue when using small Drupal sites (because the whole database executes in shared memory) or when using Drupal with caching. But ... on large installations like mine, you cannot cache everything without a huge amount of memory. And the result is SLOW. Using PostgreSQL, normal queries should run in less than one millisecond, usually 0,2 millisecond. Long queries with difficult JOINS should run in less that 5 milliseconds. This way, the database server is always faster than anay kind of PHP cache. And you can use large sites on small sites. There is an extensive query debugging needed on Drupal: 1) Turn on extensive logging in PostgreSQL, including query execution times. 2) Whenever a query executes in more than 1 millisecond, run psql or pgAdmin (graphical client) to run EXPLAIN ANALYSE. 3) Find sequential scans and fix them. Most of times, JOINS and clauses and ORDER BY should be on indexes. 4) Add proper indexing 5) Run EXPLAIN ANALYSE again. Times usualy are divided by 10. 6) Add indexes everywhere. After this hard work, a normal Drupal 6.2 installation should be able to answer thousands of queries. Do not hesitate to answer my issue : http://drupal.org/node/276742 The issue offers a pratical example. Again, this kind of problem will only show on large sites (like Drupal.org), not a small installation. But I consider this as a bug, a very important issue which should be fixed ASAP. Kind regards, Jean-Michel Pouré
On Mon, 2008-06-30 at 16:05 +0200, Jean-Michel Pouré wrote:
When using MySQL, you don't really know how SQL queries are treated, except execution times that may be monitored (?).
What's wrong with MySQL's standard EXPLAIN command? :-) Not to knock PostgreSQL -- I use and like both databases. But EXPLAIN provides very similar information to PostgreSQL's query plan. Scott -- Syscrusher <syscrusher@4th.com>
On Jun 30, 2008, at 10:05 AM, Jean-Michel Pouré wrote:
In the above issue, I could speed-up a simple query by 10 times. You may not notice this issue when using small Drupal sites (because the whole database executes in shared memory) or when using Drupal with caching. But ... on large installations like mine, you cannot cache everything without a huge amount of memory. And the result is SLOW.
I have implemented a large scale Drupal site that uses PostGres for the backend. We benchmarked it at around 4 million page views an hour with memcache installed and using a cache injection strategy. We had to perform extensive modifications to the database schema to get the core database performing well by adding indexes and primary keys to every table (which is extremely important when working in a clustered / replicated / federated environment, and which large scale sites also need to address). Other people in this thread have commented on the lack of testing around PostGres, and the original poster pointed out the lack of proper indexing on contributed modules. I think these comments get to the heart of the matter, that the problems with PostGres support in Drupal is systemic and arises from an lack of resources within the community. On the other hand, proper PostGres tuning is far more important than anything else you can do in terms of overall performance. There are a number of kernel level tuning steps specific to PostGres servers which should be taken in order to ensure maximum database throughput. PostGres is just different enough from MySQL to throw people for a serious loop if they do not know what they are getting into. We worked with a PostGres consulting firm called Command Prompt and they were truly excellent in helping us improve database performance, tweak the servers, and identify which tables were getting hit the hardest so we knew where to start with our enhancements to core. We did do an EXPLAIN ANALYZE on every query in our installation (by exposing each one with devel) and this included a large number of contributed modules. The most problematic queries were the ones coming from user_points and views, which should come as no surprise, and there were a number of minor modifications we needed to make to other modules to get the most out of them. At no point did we see anything near a 10x improvement in query performance across the board, although we did manage to get some queries running that were locking up our servers under load. This was a brute force tactic to improving system performance needed to get the project done, and a better approach to resolving PostGres issues within Drupal might be to use PostGres' query logger to get snapshots of database behavior under load and address the biggest outliers in terms of performance. While log files can quickly become huge, taking 60 second snapshots and sorting them for query performance time can be very instructive. Also, anyone dealing with issues of load needs to be aware of the actual number of queries running on each page, which is quite a lot even with most caching enabled. Either Jeremy Andrews or Gerhard wrote a nice patch to eliminate all queries from core for anonymous users, although I am not sure how useful this is for users in general (the point is it can be done). The most significant problem we faced, outside of straight up bad queries, was repetitive queries. The userpoints module was challenging to work with and we did need to rewrite most of the data access there to make it suitable for a production environment. There are other modules we changed, but this one required the most effort. We had some pages that were hitting the points tables over 100 times per page load, even for anonymous users with memcache installed. That said, for now, use MySQL if you can. : ) It is the swiss army knife to PostGres' blazing scimitar, there are a lot more clustering options available, it can perform most simple queries much faster than PostGres (read ahead memory adds a lot of overhead to queries with less than 3 joins compared to MySQL), the FOSS db admin tools available for it are more sophisticated, and Drupal certainly likes it more. What's ironic about this discussion is the PostGres developer base is growing at a faster rate than MySQL and adoption is increasing for a number of other FOSS projects. At some point, I think demand for PostGres implementations is going to grow to the point where the systemic issues are going to have to go away for Drupal. These kinds of problems are not solved quickly or easily, but hopefully it helps knowing other people have faced the same challenges and been successful at achieving performance goals. M
We did do an EXPLAIN ANALYZE on every query in our installation (by exposing each one with devel) and this included a large number of contributed modules. The most problematic queries were the ones coming from user_points and views, which should come as no surprise,
A patch by a large site was commited. It adds a few indexes that increase the performance for the txn table. The most significant problem we faced, outside of straight up bad queries,
was repetitive queries. The userpoints module was challenging to work with and we did need to rewrite most of the data access there to make it suitable for a production environment. There are other modules we changed, but this one required the most effort. We had some pages that were hitting the points tables over 100 times per page load, even for anonymous users with memcache installed.
If this is a list of users with points for each listed beside their names, then yes, it has to query the table several times on the page. Just like user_load or node_load for different users/nodes. If you have implemented some sort of smart caching, then we are looking forward to your patches in the issue queue.
What's ironic about this discussion is the PostGres developer base is growing at a faster rate than MySQL and adoption is increasing for a number of other FOSS projects.
What I have seen is that Sun's purchase of MySQL, and MySQL's management revoking the open source aspect of some features, then reversing that decision, all this has not gone well with some. Moreover, the departure from MySQL/Sun of Jim Starkey, architect of the new transactional Falcon engine, and Oracle's ownership of InnoBase has left the question of the future of a transactional engine (Maria is not transactional, just a replacement for MyISAM). PostgreSQL has been transactional from the start though. -- Khalid M. Baheyeldin 2bits.com, Inc. http://2bits.com Drupal optimization, development, customization and consulting.
On Wed, 2008-07-09 at 08:36 -0400, Khalid Baheyeldin wrote:
(Maria is not transactional, just a replacement for MyISAM).
While it is true that the first preview releases (1.0 and 1.5) of Maria are not going to be transactional (they will be crash safe, but will not yet include ROLLBACK support) this is a temporary condition for Monty's new storage engine. He explains that among his goals for Maria in MySQL 6, it will be a fully transactional storage engine that can replace MyISAM. The 2.0 release of Maria is intended to be fully transactional (crash safe, supporting ROLLBACK, etc), while continuing to support the ability for completely disabling this transactional functionality if desired: http://monty-says.blogspot.com/2008/01/maria-engine-is-released.html Cheers, -Jeremy
Le mercredi 09 juillet 2008 à 08:36 -0400, Khalid Baheyeldin a écrit :
If this is a list of users with points for each listed beside their names, then yes, it has to query the table several times on the page. Just like user_load or node_load for different users/nodes.
If you have implemented some sort of smart caching, then we are looking forward to your patches in the issue queue.
This is probably the biggest problem. In logs, I can see 20 times the same query. Kind regards, Jean-Michel
Quoting Michael Haggerty <mhaggerty-lists@trellon.com>:
What's ironic about this discussion is the PostGres developer base is growing at a faster rate than MySQL and adoption is increasing for a number of other FOSS projects. At some point, I think demand for PostGres implementations is going to grow to the point where the systemic issues are going to have to go away for Drupal. These kinds of problems are not solved quickly or easily, but hopefully it helps knowing other people have faced the same challenges and been successful at achieving performance goals.
I've kept quiet about this but IMO the index issue plagues MySql implementations just as badly. Especially for the InnoDB engine. I've had to remove and add indexes on various tables to help improve the performance. I've just not had the time to put together those changes toward a patch. Earnie -- http://for-my-kids.com/ -- http://give-me-an-offer.com/
Quoting David Timothy Strauss <david@fourkitchens.com>:
----- "Earnie Boyd" <earnie@users.sourceforge.net> wrote:
Especially for the InnoDB engine.
How would index issues particularly affect the InnoDB engine?
Performance and the size of the index file. My reference is for Drupal 5 and I do not know what may have changed for Drupal 6 yet but there are a number of duplicated indexes on some of the tables. A column might be used as the first position in combination with another column to create an index and also be declared an index alone. This becomes a duplicate index and isn't needed. Some tables do not have unique or primary key identifiers a big performance hit. Earnie -- http://for-my-kids.com/ -- http://give-me-an-offer.com/
There were several patches in Drupal 6 which removed duplicate indexes (and/or added missing ones), so it's likely that pretty much all of these issues are resolved. Nat On Thu, Jul 10, 2008 at 1:18 PM, Earnie Boyd
Performance and the size of the index file. My reference is for Drupal 5 and I do not know what may have changed for Drupal 6 yet but there are a number of duplicated indexes on some of the tables. A column might be used as the first position in combination with another column to create an index and also be declared an index alone. This becomes a duplicate index and isn't needed. Some tables do not have unique or primary key identifiers a big performance hit.
participants (8)
-
David Timothy Strauss -
Earnie Boyd -
Jean-Michel Pouré -
Jeremy Andrews -
Khalid Baheyeldin -
Michael Haggerty -
Nathaniel Catchpole -
Syscrusher