Drupal performance patches: call for action
Hello world, I provided a list of what I believe are the top-6 most important performance patches for Drupal 6 and drupal.org (which will run on Drupal 6). If you know people that want to help, please point them to http://drupal.org/node/163216#comment-257631 or wherever the latest version of that list continues to live on. The slowest queries on drupal.org are the ones for (1) the tracker page, (2) the forum blocks, (3) the forum topics' next/previous links and (4) the search module. Since we disabled most of those features, drupal.org is again more usable. Of course, we want to re-enable all of those feature as soon as possible. To make that happen, we need help from developers and database experts. To provide some focus, I compiled a list of the 6 most important performance patches that you can help with. I truly hope to get some of these into Drupal 6, and to backport them to Drupal 5 so we can use them on drupal.org until we upgrade drupal.org to Drupal 6. Important note: these patches are still allowed to change the APIs in Drupal 6, and are about the _only_ patches that can break our APIs before Drupal 6 beta 1 is released. This policy will hopefully help us focus -- these issues is where all of our Drupal core action should be. So here is the list: 1. http://drupal.org/node/147160 - Database replication: database replication will help us distribute the load among multiple database server. This will help us with (1), (2), (3), (4) and more. Without this patch, we can't even take advantage of the extra hardware that we're installing. Needless to say, this patch is critical. Some extra background information and thoughts are available at http://buytaert.net/scaling-with-mysql-replication. 2. http://drupal.org/node/148849 - Merge {node_comment_statistics} and {node_counter} into {node}: looking at the slow query log on Drupal.org we have reasons to believe that this patch could help us with (1) and (2) and (4). There are some reservations as well so we need people to help benchmark this patch so we can weigh the advantages and the disadvantages. After some good testing and benchmarking, we should be able to drive this patch home. 3. http://drupal.org/node/80951 - Block caching: being able to cache expensive blocks would help us with (2) as it eliminates expensive queries. 4. http://drupal.org/node/105639 - Tracker query rewrite: would help us with (1) because it rewrites an expensive query. 5. http://drupal.org/node/146466 - Link handling in search module: this patch reduces the complexity of the search module and will help with (4). I helps performance and it makes for a better search. 6. http://drupal.org/node/106559 - Path lookups: we're brainstorming about how we could reduce the number of look ups required for URL aliasing. I think we need to look at de-normalizing the path alias table, but there are some other ideas that are being proposed, measured and discussed. There are likely to be more patches, but I'd like to start with those. It's more than enough work already, and it is better not to spread ourselves too thin. If you're convinced that there might be a more important performance patch, you can work on that too. I recommend that you bookmark these issues in your browser's toolbar and that you ask you boss whether you can spend some time working on these patches. ;) Thanks, -- Dries Buytaert :: http://buytaert.net/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dries Buytaert schrieb:
Hello world,
I provided a list of what I believe are the top-6 most important performance patches for Drupal 6 and drupal.org (which will run on Drupal 6). If you know people that want to help, please point them to http://drupal.org/node/163216#comment-257631 or wherever the latest version of that list continues to live on.
Thanks for compiling this list.
The slowest queries on drupal.org are the ones for (1) the tracker page, (2) the forum blocks, (3) the forum topics' next/previous links and (4) the search module. Since we disabled most of those features, drupal.org is again more usable. Of course, we want to re-enable all of those feature as soon as possible.
Personally, I think we could do without the next/prev links on forums. I am a bit reluctant to tamper with bluebeach, but simply adding an empty bluebeach_forum_topic_navigation to template.php should take care of this.
To make that happen, we need help from developers and database experts. To provide some focus, I compiled a list of the 6 most important performance patches that you can help with. I truly hope to get some of these into Drupal 6, and to backport them to Drupal 5 so we can use them on drupal.org until we upgrade drupal.org to Drupal 6.
Important note: these patches are still allowed to change the APIs in Drupal 6, and are about the _only_ patches that can break our APIs before Drupal 6 beta 1 is released. This policy will hopefully help us focus -- these issues is where all of our Drupal core action should be.
So here is the list:
1. http://drupal.org/node/147160 - Database replication: database replication will help us distribute the load among multiple database server. This will help us with (1), (2), (3), (4) and more. Without this patch, we can't even take advantage of the extra hardware that we're installing. Needless to say, this patch is critical. Some extra background information and thoughts are available at http://buytaert.net/scaling-with-mysql-replication.
This patch is installed at scratch.drupal.org, so please people go there and try to find errors. In addition to what is in the patch, I have modified several forum queries to go to the slave (currently, there is only one), the search queries go to the slave, as well as all pager queries.
2. http://drupal.org/node/148849 - Merge {node_comment_statistics} and {node_counter} into {node}: looking at the slow query log on Drupal.org we have reasons to believe that this patch could help us with (1) and (2) and (4). There are some reservations as well so we need people to help benchmark this patch so we can weigh the advantages and the disadvantages. After some good testing and benchmarking, we should be able to drive this patch home.
Yeah, I'd be one of the people who have reservations on this, especially on the node_counter part. This will give a lot of writes to the node table, which is a stupid thing to do considering that the rest of the table is -+ static (compared to the node counter table). This follows from simply physical considerations, but I've recently found a blog entry by somebody who does understand much more about php/mysql than I do who actually agrees with me and explains it in a more technical way (B-trees and sectors on hard disks and a lot of stuff I don't really understand.): http://blog.koehntopp.de/archives/1775-Hardware-fuer-ein-MySQL.html Important to me is his result: "Wenn wir zum Beispiel eine Benutzertabelle haben, dann enthält diese überwiegend statische Daten wie zum Beispiel den Benutzernamen, das Paßwort, die Benutzeranschrift und andere Stammdaten. Sie enthält aber vielleicht auch sehr dynamische Daten wie zum Beispiel das Datum des letzten Logins. Datenbanktheoretisch gehört diese Information auch in die User-Tabelle, aber wegen der physikalischen Implementierung wird es sehr sinnvoll sein, eine künstliche 1:1 Relation einzuführen und das Paar (userid, lastlogin) in eine Extratabelle abzuspalten." Translation: "If we for example have a user table, then this will contain mostly static data such as the username, password, address and other data. It does maybe also contain very dynamical data such as the time of the last login. From a theoretical point of view, these informations also belong into the user table but because of the physial implementation it will make a lot of sense to introduce an artificial 1:1 relation and to split off the pair (userid, lastlogin) into an extra table." I've already proposed that for the users table, where it didn't went through (but did get a band aid), but introducing bad design also into the node table will not make me happy.
3. http://drupal.org/node/80951 - Block caching: being able to cache expensive blocks would help us with (2) as it eliminates expensive queries.
I am of course especially fond of this patch. ;)
4. http://drupal.org/node/105639 - Tracker query rewrite: would help us with (1) because it rewrites an expensive query.
A lot of people have looked into it and it simply doesn't get much better if we want to keep the functionality. I am afraid that only a caching solution will significantly improve the performance.
5. http://drupal.org/node/146466 - Link handling in search module: this patch reduces the complexity of the search module and will help with (4). I helps performance and it makes for a better search.
I'd very much like to try a backport of this patch on drupal.org. Our DBA from OSUOSL, Narayan Newton, has looked at the current search queries and found that the index usage is very bad. This patch might improve this.
6. http://drupal.org/node/106559 - Path lookups: we're brainstorming about how we could reduce the number of look ups required for URL aliasing. I think we need to look at de-normalizing the path alias table, but there are some other ideas that are being proposed, measured and discussed.
Here for example: http://drupal.org/node/100301 I am running #106559 on drupal.org. It is difficult to say if it makes a huge difference, but I think that having less of these pesky select * from url_alias queries can't hurt.
There are likely to be more patches, but I'd like to start with those.
Yes, I concur.
It's more than enough work already, and it is better not to spread ourselves too thin. If you're convinced that there might be a more important performance patch, you can work on that too.
I recommend that you bookmark these issues in your browser's toolbar and that you ask you boss whether you can spend some time working on these patches. ;)
Right. And let me know if your boss says "no". Cheers, Gerhard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGsFRnfg6TFvELooQRAkt/AJ95uExa0bplAhv6n9DONw3FPkEVcwCgvz3C dAtzl1ToawIaRSEIAPiXDQM= =HUpX -----END PGP SIGNATURE-----
On 01 Aug 2007, at 11:37, Gerhard Killesreiter wrote:
Yeah, I'd be one of the people who have reservations on this, especially on the node_counter part. This will give a lot of writes to the node table, which is a stupid thing to do considering that the rest of the table is -+ static (compared to the node counter table).
Here is the concise summary: 1. Locality of information is important -- it avoids queries. If you use A and B together, you want to get that data in one query. This holds for both spatial and temporal locality. 2. You want to split cold and hot data -- it avoids having to load large amounts of data into memory. Sometimes, both are at odds -- especially when MySQL's locking comes into play. In case of the user access table it might make sense to split of the access-field. It's a "cold" field; it's hardly ever read but it does get some writes. Because the field is almost never read, the fact that we loose spatial locality isn't much of a concern. The node_counter field is both read and write heavy. By splitting it off, you loose the locality advantage and it's not clear what is more important: the fact that we can simplify thousands of read queries, or the fact that we can avoid some table locking. Only extensive benchmarks can tell. -- Dries Buytaert :: http://www.buytaert.net/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dries Buytaert schrieb:
On 01 Aug 2007, at 11:37, Gerhard Killesreiter wrote:
Yeah, I'd be one of the people who have reservations on this, especially on the node_counter part. This will give a lot of writes to the node table, which is a stupid thing to do considering that the rest of the table is -+ static (compared to the node counter table).
Here is the concise summary:
1. Locality of information is important -- it avoids queries. If you use A and B together, you want to get that data in one query. This holds for both spatial and temporal locality.
2. You want to split cold and hot data -- it avoids having to load large amounts of data into memory.
Sometimes, both are at odds -- especially when MySQL's locking comes into play.
I didn't even mention locking. :p Also, Köhntopp mentions specifically InnoDB, not MyISAM, so I believe his analysis doesn't only apply to MyISAM.
In case of the user access table it might make sense to split of the access-field. It's a "cold" field; it's hardly ever read but it does get some writes.
Right. Also the "login" field.
Because the field is almost never read, the fact that we loose spatial locality isn't much of a concern.
The node_counter field is both read and write heavy. By splitting it
If you use the feature, yes. There is good reason to not use it.
off, you loose the locality advantage and it's not clear what is more important: the fact that we can simplify thousands of read queries, or the fact that we can avoid some table locking. Only extensive benchmarks can tell.
Yeah, that sums it up neatly. Problem is: Nobody is going to do benchmarks extensive enough to tell. Cheers, Gerhard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGsGWGfg6TFvELooQRAsBlAKCsepQtOGsalhp8g4CIDDEEHxt4JACeM2FQ ZBp5QILQxgK9ZaJIQNBHQag= =iUvI -----END PGP SIGNATURE-----
Quoting Gerhard Killesreiter <gerhard@killesreiter.de>:
Nobody is going to do benchmarks extensive enough to tell.
Sounds like a development module to me. ;D So what might that take? A baseline set of data Some cron jobs Earnie
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Earnie Boyd schrieb:
Quoting Gerhard Killesreiter <gerhard@killesreiter.de>:
Nobody is going to do benchmarks extensive enough to tell.
Sounds like a development module to me. ;D So what might that take?
A baseline set of data Some cron jobs
Ideally, you'd have two servers, one for apache, one for mysql. Then you'd create some dummy data and users using devel.module. Then you'd start wgets (or somethign else) to crawl the site. Use both anonymous wgets and logged in ones. Take numbers of time it takes a wget to finish the site. Increase number of concurrent wgets. Repeat. Do the same with and without the proposed patch. Cheers, Gerhard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGsGsvfg6TFvELooQRAhEgAJ4nkwTQE2eljUEcYzzcYX1RnDUlpgCeOYZ7 iIQDbxkLr5ejvs2PR4PG6IM= =g2kK -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Gerhard Killesreiter schrieb:
Earnie Boyd schrieb:
Quoting Gerhard Killesreiter <gerhard@killesreiter.de>:
Nobody is going to do benchmarks extensive enough to tell.
Sounds like a development module to me. ;D So what might that take?
A baseline set of data Some cron jobs
Ideally, you'd have two servers, one for apache, one for mysql.
Then you'd create some dummy data and users using devel.module.
Then you'd start wgets (or somethign else) to crawl the site. Use both anonymous wgets and logged in ones. Take numbers of time it takes a wget to finish the site. Increase number of concurrent wgets. Repeat.
Do the same with and without the proposed patch.
And with pgsql, mysql (myISAM), mysql(InnoDB) for good measure. :p Cheers, Gerhard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGsGx/fg6TFvELooQRAgOgAKC06ZQ/rxecqHfEpbTKtGcvGJL3NgCgrSRh 588uTHAVZthBbMahjZrKIfo= =3V5M -----END PGP SIGNATURE-----
Gerhard Killesreiter wrote:
"If we for example have a user table, then this will contain mostly static data such as the username, password, address and other data. It does maybe also contain very dynamical data such as the time of the last login. From a theoretical point of view, these informations also belong into the user table but because of the physial implementation it will make a lot of sense to introduce an artificial 1:1 relation and to split off the pair (userid, lastlogin) into an extra table."
That works well if the information in the 1::1 table is loaded as supplementary information when you already know what rows you want in what order from one of the tables. It does not work well if your WHERE and/or ORDER BY conditions span the tables. It puts you in the temp table/filesort hell that is {node_comment_statistics}. Also, in the case of {node}, we almost always require that status = 1. By doing that, we start using {node} for query criteria. It severely restricts optimization options for queries that join {node} and a 1::1 table. Even if you use a table for supplemental data and never use it for filtering or ordering, you're doing an extra disk access to collect the supplemental information. I'm also not sure I buy the locking argument, as least for Drupal. If we use {node} for static data and {crazy} for dynamic data, it only helps us on MyISAM lock-contention for {node} reads. {crazy} stays flooded with locks, by design. But if {crazy} is usually joined with {node}, then queries have to wait on {crazy}'s locks, anyway. Right now, Drupal is doing such joins. Every time you load a node, it loads the dynamic data, like number of comments, with it. We would need to design Drupal to avoid reading dynamic tables until actually necessary.
On Wednesday 01 August 2007, Dries Buytaert wrote:
Hello world,
I provided a list of what I believe are the top-6 most important performance patches for Drupal 6 and drupal.org (which will run on Drupal 6). If you know people that want to help, please point them to http://drupal.org/node/163216#comment-257631 or wherever the latest version of that list continues to live on.
The slowest queries on drupal.org are the ones for (1) the tracker page, (2) the forum blocks, (3) the forum topics' next/previous links and (4) the search module. Since we disabled most of those features, drupal.org is again more usable. Of course, we want to re-enable all of those feature as soon as possible.
To make that happen, we need help from developers and database experts. To provide some focus, I compiled a list of the 6 most important performance patches that you can help with. I truly hope to get some of these into Drupal 6, and to backport them to Drupal 5 so we can use them on drupal.org until we upgrade drupal.org to Drupal 6.
Important note: these patches are still allowed to change the APIs in Drupal 6, and are about the _only_ patches that can break our APIs before Drupal 6 beta 1 is released. This policy will hopefully help us focus -- these issues is where all of our Drupal core action should be.
So here is the list:
1. http://drupal.org/node/147160 - Database replication: database replication will help us distribute the load among multiple database server. This will help us with (1), (2), (3), (4) and more. Without this patch, we can't even take advantage of the extra hardware that we're installing. Needless to say, this patch is critical. Some extra background information and thoughts are available at http://buytaert.net/scaling-with-mysql-replication.
Dries, to clarify here: There's two different approaches currently listed in that issue: A) db_query_slave() syntax B) array-with-parameter syntax. A is currently running on s.d.o, while B is closer to what I am hoping to use for Drupal 7[1]. In the issue you mentioned that you were concerned about the array handling in B. Do you mean the API or internal $args shuffling? I want to make sure I'm working on the correct problem. I see 3 options on the replication API front: 1) Use A in Drupal 5.3 and Drupal 6, and switch to B as part of the overhaul in Drupal 7. 2) Clean up whatever the remaining objections are to B and use that now so there's less API change later. 3) Use A now and drop B completely, keeping A come Drupal 7. (Not good for D7's API, IMO.) I want to make sure we're on the same page as to the approach so that there's no wasted effort. Which are we doing? :-) [1] http://www.garfieldtech.com/blog/drupal-7-database-plans -- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
On 01 Aug 2007, at 14:38, Larry Garfield wrote:
Dries, to clarify here: There's two different approaches currently listed in that issue: A) db_query_slave() syntax B) array-with-parameter syntax. A is currently running on s.d.o, while B is closer to what I am hoping to use for Drupal 7[1]. In the issue you mentioned that you were concerned about the array handling in B. Do you mean the API or internal $args shuffling? I want to make sure I'm working on the correct problem.
I see 3 options on the replication API front:
1) Use A in Drupal 5.3 and Drupal 6, and switch to B as part of the overhaul in Drupal 7.
2) Clean up whatever the remaining objections are to B and use that now so there's less API change later.
3) Use A now and drop B completely, keeping A come Drupal 7. (Not good for D7's API, IMO.)
I want to make sure we're on the same page as to the approach so that there's no wasted effort. Which are we doing? :-)
I'm leaning towards A so let's go with that for now. It wouldn't hurt to discuss this some more though. -- Dries Buytaert :: http://www.buytaert.net/
On 8/1/07, Dries Buytaert <dries.buytaert@gmail.com> wrote:
1) Use A in Drupal 5.3 and Drupal 6, and switch to B as part of the overhaul in Drupal 7.
2) Clean up whatever the remaining objections are to B and use that now so there's less API change later.
3) Use A now and drop B completely, keeping A come Drupal 7. (Not good for D7's API, IMO.)
I want to make sure we're on the same page as to the approach so that there's no wasted effort. Which are we doing? :-)
I'm leaning towards A so let's go with that for now. It wouldn't hurt to discuss this some more though.
I'd prefer to use db_query_slave(). It makes it simple to direct safe queries to the slave with a minimum of changes. I'd rather wait and make all the D7 database changes at once rather than trying to anticipate what we'll want to do. It's good to have an eye on where you'd like to be so you don't paint yourself into a corner but there's a lot than can happen between now and then. andrew
On Aug 1, 2007, at 3:50 PM, andrew morton wrote:
I'd prefer to use db_query_slave(). It makes it simple to direct safe queries to the slave with a minimum of changes.
<politically_correct type="proud"> Am I the only one who recoils at the usage of the word "slave" for this? Do we really have to let that word get all the way into the Drupal core API itself? :( While I'm sure some of the people on this list would think it's funny, no one in their right mind would call this "db_query_bitch()", either, even though to some, it'd have basically the same meaning. *sigh* </politically_correct> If we're going to use a separate query function for queries that can be directed to the read-only DB(s), why not call it something like "db_query_read()"? Pardon my ignorance in split DB configurations and queries, but are there any read-only queries that aren't safe for this? Don't we just have to direct SELECT queries one way, and UPDATE or INSERT queries another? <idea type="crazy"> Can't we just parse the query and see if it contains any UPDATE or INSERT clauses? ;) Why do we need to change the API at all? </idea> I suppose the problem is queries that SELECT to check a value and then update it, and you'd want to do both of those on the primary DB. That's why we either need transaction awareness, or we need the query author to specify what they need explicitly. Is that the issue? Even still, an explicit "db_query_read()" for read-only queries you know are safe for secondary DB servers seems the most logical to me. We'd never write to the secondary servers, correct? Cheers, -Derek (dww-the-commie-pinko-who's-no-DBA)
Since d.o is having a fit right now I'll point to the google cache'd version of the issue: http://72.14.253.104/search?q=cache:QZEkIeazljEJ:drupal.org/node/147160+http... As long as it is I'd suggest reading through that, David Strauss pretty much gave a lecture and took notes for us ;) On 8/1/07, Derek Wright <drupal@dwwright.net> wrote:
On Aug 1, 2007, at 3:50 PM, andrew morton wrote:
I'd prefer to use db_query_slave(). It makes it simple to direct safe queries to the slave with a minimum of changes.
<politically_correct type="proud">
Am I the only one who recoils at the usage of the word "slave" for this? Do we really have to let that word get all the way into the Drupal core API itself? :( While I'm sure some of the people on this list would think it's funny, no one in their right mind would call this "db_query_bitch()", either, even though to some, it'd have basically the same meaning. *sigh*
</politically_correct>
In #60 Crell also expresses some reservations about the naming. Personally I don't think it's a big deal but maybe that's because I grew up setting master/slave jumpers on IDE drives.
<idea type="crazy">
Can't we just parse the query and see if it contains any UPDATE or INSERT clauses? ;) Why do we need to change the API at all?
</idea>
I suppose the problem is queries that SELECT to check a value and then update it, and you'd want to do both of those on the primary DB. That's why we either need transaction awareness, or we need the query author to specify what they need explicitly. Is that the issue?
Yeah, in #3 David Strauss discusses the problems with auto detection, basically there's some lag between when you write something and when it's propagated to the slave. If it's important that the query reflect recent changes you need to query the master. When you don't care, you can query a slave. andrew
For functions like node_load, store_transaction_load (ecommerce module), etc -- there are times that you load an object and need the precise data that was just submitted. and other times you do not. If these types of functions are using the slave queries then I guess it has to be documented that accurate data just isn't available? Unless, there were some parameter to pass in when you need to use the master. --mark On 8/1/07, andrew morton <drewish@katherinehouse.com> wrote:
Since d.o is having a fit right now I'll point to the google cache'd version of the issue: http://72.14.253.104/search?q=cache:QZEkIeazljEJ:drupal.org/node/147160+http...
As long as it is I'd suggest reading through that, David Strauss pretty much gave a lecture and took notes for us ;)
On 8/1/07, Derek Wright <drupal@dwwright.net> wrote:
On Aug 1, 2007, at 3:50 PM, andrew morton wrote:
I'd prefer to use db_query_slave(). It makes it simple to direct safe queries to the slave with a minimum of changes.
<politically_correct type="proud">
Am I the only one who recoils at the usage of the word "slave" for this? Do we really have to let that word get all the way into the Drupal core API itself? :( While I'm sure some of the people on this list would think it's funny, no one in their right mind would call this "db_query_bitch()", either, even though to some, it'd have basically the same meaning. *sigh*
</politically_correct>
In #60 Crell also expresses some reservations about the naming. Personally I don't think it's a big deal but maybe that's because I grew up setting master/slave jumpers on IDE drives.
<idea type="crazy">
Can't we just parse the query and see if it contains any UPDATE or INSERT clauses? ;) Why do we need to change the API at all?
</idea>
I suppose the problem is queries that SELECT to check a value and then update it, and you'd want to do both of those on the primary DB. That's why we either need transaction awareness, or we need the query author to specify what they need explicitly. Is that the issue?
Yeah, in #3 David Strauss discusses the problems with auto detection, basically there's some lag between when you write something and when it's propagated to the slave. If it's important that the query reflect recent changes you need to query the master. When you don't care, you can query a slave.
andrew
That's the main issue, yes. The master/slave split is not 1:1 with select/[insert|update|delete]. You have to be able to manually control whether a select query is "slave safe", which means an API change of one sort or another. A separate function or a flag are the two options. I'm a decidedly not politically correct person myself, but I'm not a big fan of "slave" either. It is, however, the accurate technical term and I haven't come up with a better name yet. Dries: Sorry to be picky, but did you mean "A period" or "option 1" (A for now, B for Drupal 7)? :-) I really don't like the idea of duplicating every query function long-term (meaning every alternate target or additional query variant multiples the number of functions we need), although I'm amenable to doing so for just one version. With an all-array structure the clumsy argument handling code disappears anyway. On Wednesday 01 August 2007, mark burdett wrote:
For functions like node_load, store_transaction_load (ecommerce module), etc -- there are times that you load an object and need the precise data that was just submitted. and other times you do not.
If these types of functions are using the slave queries then I guess it has to be documented that accurate data just isn't available? Unless, there were some parameter to pass in when you need to use the master.
--mark
On 8/1/07, andrew morton <drewish@katherinehouse.com> wrote:
Since d.o is having a fit right now I'll point to the google cache'd version of the issue: http://72.14.253.104/search?q=cache:QZEkIeazljEJ:drupal.org/node/147160+h ttp://drupal.org/node/147160
As long as it is I'd suggest reading through that, David Strauss pretty much gave a lecture and took notes for us ;)
On 8/1/07, Derek Wright <drupal@dwwright.net> wrote:
On Aug 1, 2007, at 3:50 PM, andrew morton wrote:
I'd prefer to use db_query_slave(). It makes it simple to direct safe queries to the slave with a minimum of changes.
<politically_correct type="proud">
Am I the only one who recoils at the usage of the word "slave" for this? Do we really have to let that word get all the way into the Drupal core API itself? :( While I'm sure some of the people on this list would think it's funny, no one in their right mind would call this "db_query_bitch()", either, even though to some, it'd have basically the same meaning. *sigh*
</politically_correct>
In #60 Crell also expresses some reservations about the naming. Personally I don't think it's a big deal but maybe that's because I grew up setting master/slave jumpers on IDE drives.
<idea type="crazy">
Can't we just parse the query and see if it contains any UPDATE or INSERT clauses? ;) Why do we need to change the API at all?
</idea>
I suppose the problem is queries that SELECT to check a value and then update it, and you'd want to do both of those on the primary DB. That's why we either need transaction awareness, or we need the query author to specify what they need explicitly. Is that the issue?
Yeah, in #3 David Strauss discusses the problems with auto detection, basically there's some lag between when you write something and when it's propagated to the slave. If it's important that the query reflect recent changes you need to query the master. When you don't care, you can query a slave.
andrew
-- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
On 02 Aug 2007, at 01:08, Derek Wright wrote:
If we're going to use a separate query function for queries that can be directed to the read-only DB(s), why not call it something like "db_query_read()"? Pardon my ignorance in split DB configurations and queries, but are there any read-only queries that aren't safe for this? Don't we just have to direct SELECT queries one way, and UPDATE or INSERT queries another?
No, that's not a good solution -- we need to more fine-control as per the explanation at http://buytaert.net/scaling-with-mysql-replication. In short: read != slave.
<idea type="crazy"> Can't we just parse the query and see if it contains any UPDATE or INSERT clauses? ;) Why do we need to change the API at all? </idea>
Ditto -- not a good solution. See explanation at http://buytaert.net/ scaling-with-mysql-replication. -- Dries Buytaert :: http://www.buytaert.net/
On 01 Aug 2007, at 11:42 AM, Dries Buytaert wrote:
6. http://drupal.org/node/106559 - Path lookups: we're brainstorming about how we could reduce the number of look ups required for URL aliasing. I think we need to look at de-normalizing the path alias table, but there are some other ideas that are being proposed, measured and discussed.
Isn't the memcache patch taking care of this ?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 adrian rossouw schrieb:
On 01 Aug 2007, at 11:42 AM, Dries Buytaert wrote:
6. http://drupal.org/node/106559 - Path lookups: we're brainstorming about how we could reduce the number of look ups required for URL aliasing. I think we need to look at de-normalizing the path alias table, but there are some other ideas that are being proposed, measured and discussed.
Isn't the memcache patch taking care of this ?
Not the patch that comes with the memcached module. This one only moves the normal cache items from the DB to memcached. Also, memcached isn't available for everybody. Cheers, Gerhard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGsK7zfg6TFvELooQRAkRpAKCKw+JjxRBtVWOnsklBIcNOE0ClAwCgyOcn pENdHhhOs42gTW6fe+atWQE= =pxZT -----END PGP SIGNATURE-----
Regarding the memcache module. I don't think it would help much with path aliases, but I think it should be in core, even if optional. With that module and aggressive caching, anonymous users can be served with no database queries at all, which is very good for high traffic web sites. Also, with a good caching mechanism developers are more likely to think about proper caching; now it looks like nobody cares. I believe drupal.orgwould be the first to benefit from that. -- hex, out.
Dries Buytaert wrote:
Hello world,
I provided a list of what I believe are the top-6 most important performance patches for Drupal 6 and drupal.org (which will run on Drupal 6). If you know people that want to help, please point them to http://drupal.org/node/163216#comment-257631 or wherever the latest version of that list continues to live on.
So here is the list:
Updated the patch spotlight page to point to these patches (and not the already committed feature freeze extension patches we are done with): http://drupal.org/patch/spotlight Gabor
participants (11)
-
adrian rossouw -
andrew morton -
David Strauss -
Derek Wright -
Dries Buytaert -
Earnie Boyd -
Gabor Hojtsy -
Gerhard Killesreiter -
hex -
Larry Garfield -
mark burdett