[development] Drupal performance patches: call for action

Gerhard Killesreiter gerhard at killesreiter.de
Wed Aug 1 09:37:43 UTC 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dries Buytaert schrieb:
> Hello world,
> 
> I provided a list of what I believe are the top-6 most important
> performance patches for Drupal 6 and drupal.org (which will run on
> Drupal 6).  If you know people that want to help, please point them to
> http://drupal.org/node/163216#comment-257631 or wherever the latest
> version of that list continues to live on.
> 

Thanks for compiling this list.

> The slowest queries on drupal.org are the ones for (1) the tracker
> page, (2) the forum blocks, (3) the forum topics' next/previous links
> and (4) the search module.  Since we disabled most of those features,
> drupal.org is again more usable.  Of course, we want to re-enable all
> of those feature as soon as possible.

Personally, I think we could do without the next/prev links on forums. I
am a bit reluctant to tamper with bluebeach, but simply adding an empty
bluebeach_forum_topic_navigation to template.php should take care of this.

> To make that happen, we need help from developers and database
> experts. To provide some focus, I compiled a list of the 6 most
> important performance patches that you can help with. I truly hope to
> get some of these into Drupal 6, and to backport them to Drupal 5 so
> we can use them on drupal.org until we upgrade drupal.org to Drupal 6.
> 
> Important note: these patches are still allowed to change the APIs in
> Drupal 6, and are about the _only_ patches that can break our APIs
> before Drupal 6 beta 1 is released.  This policy will hopefully help
> us focus -- these issues is where all of our Drupal core action should
> be.
> 
> So here is the list:
> 
>    1. http://drupal.org/node/147160 - Database replication: database
> replication will help us distribute the load among multiple database
> server. This will help us with (1), (2), (3), (4) and more. Without
> this patch, we can't even take advantage of the extra hardware that
> we're installing. Needless to say, this patch is critical. Some extra
> background information and thoughts are available at
> http://buytaert.net/scaling-with-mysql-replication.

This patch is installed at scratch.drupal.org, so please people go there
and try to find errors.

In addition to what is in the patch, I have modified several forum
queries to go to the slave (currently, there is only one), the search
queries go to the slave, as well as all pager queries.

>    2. http://drupal.org/node/148849 - Merge {node_comment_statistics}
> and {node_counter} into {node}: looking at the slow query log on
> Drupal.org we have reasons to believe that this patch could help us
> with (1) and (2) and (4). There are some reservations as well so we
> need people to help benchmark this patch so we can weigh the
> advantages and the disadvantages. After some good testing and
> benchmarking, we should be able to drive this patch home.

Yeah, I'd be one of the people who have reservations on this, especially
on the node_counter part. This will give a lot of writes to the node
table, which is a stupid thing to do considering that the rest of the
table is -+ static (compared to the node counter table).

This follows from simply physical considerations, but I've recently
found a blog entry by somebody who does understand much more about
php/mysql than I do who actually agrees with me and explains it in a
more technical way (B-trees and sectors on hard disks and a lot of stuff
I don't really understand.):

http://blog.koehntopp.de/archives/1775-Hardware-fuer-ein-MySQL.html

Important to me is his result:

"Wenn wir zum Beispiel eine Benutzertabelle haben, dann enthält diese
überwiegend statische Daten wie zum Beispiel den Benutzernamen, das
Paßwort, die Benutzeranschrift und andere Stammdaten. Sie enthält aber
vielleicht auch sehr dynamische Daten wie zum Beispiel das Datum des
letzten Logins. Datenbanktheoretisch gehört diese Information auch in
die User-Tabelle, aber wegen der physikalischen Implementierung wird es
sehr sinnvoll sein, eine künstliche 1:1 Relation einzuführen und das
Paar (userid, lastlogin) in eine Extratabelle abzuspalten."

Translation:

"If we for example have a user table, then this will contain mostly
static data such as the username, password, address and other data. It
does maybe also contain very dynamical data such as the time of the last
login. From a theoretical point of view, these informations also belong
into the user table but because of the physial implementation it will
make a lot of sense to introduce an artificial 1:1 relation and to split
off the pair (userid, lastlogin) into an extra table."

I've already proposed that for the users table, where it didn't went
through (but did get a band aid), but introducing bad design also into
the node table will not make me happy.

>    3. http://drupal.org/node/80951 - Block caching: being able to
> cache expensive blocks would help us with (2) as it eliminates
> expensive queries.

I am of course especially fond of this patch. ;)

>    4. http://drupal.org/node/105639 - Tracker query rewrite: would
> help us with (1) because it rewrites an expensive query.

A lot of people have looked into it and it simply doesn't get much
better if we want to keep the functionality. I am afraid that only a
caching solution will significantly improve the performance.

>    5. http://drupal.org/node/146466 - Link handling in search module:
> this patch reduces the complexity of the search module and will help
> with (4). I helps performance and it makes for a better search.

I'd very much like to try a backport of this patch on drupal.org. Our
DBA from OSUOSL, Narayan Newton, has looked at the current search
queries and found that the index usage is very bad. This patch might
improve this.

>    6. http://drupal.org/node/106559 - Path lookups: we're
> brainstorming about how we could reduce the number of look ups
> required for URL aliasing. I think we need to look at de-normalizing
> the path alias table, but there are some other ideas that are being
> proposed, measured and discussed.

Here for example:
http://drupal.org/node/100301

I am running #106559 on drupal.org. It is difficult to say if it makes a
huge difference, but I think that having less of these pesky select *
from url_alias queries can't hurt.

> There are likely to be more patches, but I'd like to start with those.

Yes, I concur.

>  It's more than enough work already, and it is better not to spread
> ourselves too thin.  If you're convinced that there might be a more
> important performance patch, you can work on that too.
> 
> I recommend that you bookmark these issues in your browser's toolbar
> and that you ask you boss whether you can spend some time working on
> these patches. ;)

Right. And let me know if your boss says "no".

Cheers,
	Gerhard
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGsFRnfg6TFvELooQRAkt/AJ95uExa0bplAhv6n9DONw3FPkEVcwCgvz3C
dAtzl1ToawIaRSEIAPiXDQM=
=HUpX
-----END PGP SIGNATURE-----


More information about the development mailing list