[drupal-devel] Re: Database analysis of CivicSpace/Drupal
How about that 68 fields involved in JOINs are not indexed? So every time there is a join operation the database has to do search through every record on that field. Of course, if I had just waited until people thought it was a problem we wouldn't know it was a problem ;-) Kieran On Apr 15, 2005, at 6:36 PM, Dan Robinson wrote:
what problem are we trying to solve here?
Dan
There are currently 107 database tables in the latest release of CivicSpace. I believe there are 55 in Drupal core. One of the most important things you can do in improving database performance is to make sure you have the right indicies for your database. Currently we do not use foriegn keys to enforce relationships between database tables. I did an analysis of the CivicSpace code base and found that there was 156 SQL statements with at least one JOIN. We can consider these as application level relationships between tables. Here is a visual representation of those JOINs: http://civicspacelabs.org/home/files/schema.png I also analysed the indicies of the current database schema. There are 182 indicies.
The question was do they match? The answer was no.
There are at least 68 fields that are joined which do not have indicies from my initial analysis. Here are the fields in my blog post: http://civicspacelabs.org/home/node/12245
The scientific method teaches us that results are useless, unless you have a repeatable method. I am going to explain in this book( http://civicspacelabs.org/home/node/12246 ), Drupal-Docs folks, how I conducted this analysis and it needs to be repeated and verified before being acted upon. I built several tools with help from Drumm, Ankur, Tom Eliaz, and Chris Johnson(cxj). It's possible those tools ran amok.
This is the first step in what I hope will be a continued analysis of the database layer in Drupal. The next step will be to extend devel.module to do batch updates of page executions times and mysql query times. I have several theories about how to do this properly and would appreciate your input. The final stage will be to create a documented database schema that allows for external application integration with common labels. If we want to integrate with non-drupal applications we will have to allow for semantic mapping of our database schema to a common database bus architecture(See Ralph Kimball http://www.rkimball.com/). But I'lll leave that for another discussion.
Cheers, Kieran
P.S. I am off to go mountain biking, hiking, and kayaking in Hawaii for a week so don't expect too many responses from me.
--------------------------------------------------------------------- To unsubscribe, e-mail: civicspace-dev-unsubscribe@civicspacelabs.org For additional commands, e-mail: civicspace-dev-help@civicspacelabs.org
Dan has a very good point. Make sure that this is a real problem before changing it. Mysql makes some weird selections on indexes when joins are involved. Creating an index just because you join on that field is not a good move. Make liberal use of explain before and after your index creates to make sure it even gets used. Also take into account expected table sizes. Craig On Apr 15, 2005, at 10:28 PM, Kieran Lal wrote:
How about that 68 fields involved in JOINs are not indexed?
So every time there is a join operation the database has to do search through every record on that field. Of course, if I had just waited until people thought it was a problem we wouldn't know it was a problem ;-)
Kieran On Apr 15, 2005, at 6:36 PM, Dan Robinson wrote:
what problem are we trying to solve here?
Dan
There are currently 107 database tables in the latest release of CivicSpace. I believe there are 55 in Drupal core. One of the most important things you can do in improving database performance is to make sure you have the right indicies for your database. Currently we do not use foriegn keys to enforce relationships between database tables. I did an analysis of the CivicSpace code base and found that there was 156 SQL statements with at least one JOIN. We can consider these as application level relationships between tables. Here is a visual representation of those JOINs: http://civicspacelabs.org/home/files/schema.png I also analysed the indicies of the current database schema. There are 182 indicies.
The question was do they match? The answer was no.
There are at least 68 fields that are joined which do not have indicies from my initial analysis. Here are the fields in my blog post: http://civicspacelabs.org/home/node/12245
The scientific method teaches us that results are useless, unless you have a repeatable method. I am going to explain in this book( http://civicspacelabs.org/home/node/12246 ), Drupal-Docs folks, how I conducted this analysis and it needs to be repeated and verified before being acted upon. I built several tools with help from Drumm, Ankur, Tom Eliaz, and Chris Johnson(cxj). It's possible those tools ran amok.
This is the first step in what I hope will be a continued analysis of the database layer in Drupal. The next step will be to extend devel.module to do batch updates of page executions times and mysql query times. I have several theories about how to do this properly and would appreciate your input. The final stage will be to create a documented database schema that allows for external application integration with common labels. If we want to integrate with non-drupal applications we will have to allow for semantic mapping of our database schema to a common database bus architecture(See Ralph Kimball http://www.rkimball.com/). But I'lll leave that for another discussion.
Cheers, Kieran
P.S. I am off to go mountain biking, hiking, and kayaking in Hawaii for a week so don't expect too many responses from me.
--------------------------------------------------------------------- To unsubscribe, e-mail: civicspace-dev-unsubscribe@civicspacelabs.org For additional commands, e-mail: civicspace-dev-help@civicspacelabs.org
Craig, in my first round of analysis I actually outputted record size along with indices. It's certainly important I agree, but each instance of Drupal will have different usage patterns and so we need to do some high level analysis of the database schema first. For example, on CivicSpaceLabs.org the biggest table is the search table. It has 500 000 records. Compare this to the node table with 11 000 records. However, that doesn't mean you shouldn't a index table that has only 5 000 records. If it has JOINs you should index until you can disapprove the need for it. Usually, that happens when you are doing a lot of updates, and the indexing overhead is too punishing. I did recommend that this method be repeated on multiple Drupal application instances. Once we have a good grasp on the database design, we should do real world performance analysis. Unfortunately, we don't control the hardware platform and it's unlikely we will enough performance analysis to check across multiple platforms to be sure we have the best general solution. Making changes to the Devel.module is good step towards this. Cheers, Kieran On Apr 15, 2005, at 7:25 PM, Craig Courtney wrote:
Dan has a very good point. Make sure that this is a real problem before changing it. Mysql makes some weird selections on indexes when joins are involved. Creating an index just because you join on that field is not a good move. Make liberal use of explain before and after your index creates to make sure it even gets used. Also take into account expected table sizes.
Craig
On Apr 15, 2005, at 10:28 PM, Kieran Lal wrote:
How about that 68 fields involved in JOINs are not indexed?
So every time there is a join operation the database has to do search through every record on that field. Of course, if I had just waited until people thought it was a problem we wouldn't know it was a problem ;-)
Kieran On Apr 15, 2005, at 6:36 PM, Dan Robinson wrote:
what problem are we trying to solve here?
Dan
There are currently 107 database tables in the latest release of CivicSpace. I believe there are 55 in Drupal core. One of the most important things you can do in improving database performance is to make sure you have the right indicies for your database. Currently we do not use foriegn keys to enforce relationships between database tables. I did an analysis of the CivicSpace code base and found that there was 156 SQL statements with at least one JOIN. We can consider these as application level relationships between tables. Here is a visual representation of those JOINs: http://civicspacelabs.org/home/files/schema.png I also analysed the indicies of the current database schema. There are 182 indicies.
The question was do they match? The answer was no.
There are at least 68 fields that are joined which do not have indicies from my initial analysis. Here are the fields in my blog post: http://civicspacelabs.org/home/node/12245
The scientific method teaches us that results are useless, unless you have a repeatable method. I am going to explain in this book( http://civicspacelabs.org/home/node/12246 ), Drupal-Docs folks, how I conducted this analysis and it needs to be repeated and verified before being acted upon. I built several tools with help from Drumm, Ankur, Tom Eliaz, and Chris Johnson(cxj). It's possible those tools ran amok.
This is the first step in what I hope will be a continued analysis of the database layer in Drupal. The next step will be to extend devel.module to do batch updates of page executions times and mysql query times. I have several theories about how to do this properly and would appreciate your input. The final stage will be to create a documented database schema that allows for external application integration with common labels. If we want to integrate with non-drupal applications we will have to allow for semantic mapping of our database schema to a common database bus architecture(See Ralph Kimball http://www.rkimball.com/). But I'lll leave that for another discussion.
Cheers, Kieran
P.S. I am off to go mountain biking, hiking, and kayaking in Hawaii for a week so don't expect too many responses from me.
-------------------------------------------------------------------- - To unsubscribe, e-mail: civicspace-dev-unsubscribe@civicspacelabs.org For additional commands, e-mail: civicspace-dev-help@civicspacelabs.org
I think Dan is looking for a different sort of answer. Seems to me you are trying to "make Drupal serve pages faster". In my experience, this optimization exercise has to start with some profiling of existing performance. In order to do that, you can use devel.module for database layer and xdebug for everything else. Observe a few problematic pages and figure out where their time is being spent. Consider testing for authenticated and anonymous users (enable caching). If you find a particular query which is slow, we are lucky if it is fixed by adding an index. Our last enhancement to forum and tracker required us to add a new node_comment_statistics table and refactor lots of code to use it. I'm all for making Drupal faster, I just think that lots of indices is not the right approach. I would accept a patch to devel.module which logged slow queries and pages to the watchdog. -moshe On Apr 15, 2005, at 10:28 PM, Kieran Lal wrote:
How about that 68 fields involved in JOINs are not indexed?
So every time there is a join operation the database has to do search through every record on that field. Of course, if I had just waited until people thought it was a problem we wouldn't know it was a problem ;-)
Kieran On Apr 15, 2005, at 6:36 PM, Dan Robinson wrote:
what problem are we trying to solve here?
Dan
There are currently 107 database tables in the latest release of CivicSpace. I believe there are 55 in Drupal core. One of the most important things you can do in improving database performance is to make sure you have the right indicies for your database. Currently we do not use foriegn keys to enforce relationships between database tables. I did an analysis of the CivicSpace code base and found that there was 156 SQL statements with at least one JOIN. We can consider these as application level relationships between tables. Here is a visual representation of those JOINs: http://civicspacelabs.org/home/files/schema.png I also analysed the indicies of the current database schema. There are 182 indicies.
The question was do they match? The answer was no.
There are at least 68 fields that are joined which do not have indicies from my initial analysis. Here are the fields in my blog post: http://civicspacelabs.org/home/node/12245
The scientific method teaches us that results are useless, unless you have a repeatable method. I am going to explain in this book( http://civicspacelabs.org/home/node/12246 ), Drupal-Docs folks, how I conducted this analysis and it needs to be repeated and verified before being acted upon. I built several tools with help from Drumm, Ankur, Tom Eliaz, and Chris Johnson(cxj). It's possible those tools ran amok.
This is the first step in what I hope will be a continued analysis of the database layer in Drupal. The next step will be to extend devel.module to do batch updates of page executions times and mysql query times. I have several theories about how to do this properly and would appreciate your input. The final stage will be to create a documented database schema that allows for external application integration with common labels. If we want to integrate with non-drupal applications we will have to allow for semantic mapping of our database schema to a common database bus architecture(See Ralph Kimball http://www.rkimball.com/). But I'lll leave that for another discussion.
Cheers, Kieran
P.S. I am off to go mountain biking, hiking, and kayaking in Hawaii for a week so don't expect too many responses from me.
--------------------------------------------------------------------- To unsubscribe, e-mail: civicspace-dev-unsubscribe@civicspacelabs.org For additional commands, e-mail: civicspace-dev-help@civicspacelabs.org
I have had a long conversation with Dan off line. My point wasn't to start adding indexes. My point was to ask whether the database schema was well designed for our application. I think matching indexes to joins is a good start. I hope I was pretty clear that I wanted this method to be repeated for Drupal core, and many other instances of Drupal with various modules installed. Starting with a good db schema design is just a start. If we could actually document why all the indices are there that would be a big step forward in helping people understand the db layer. With 182 indices I doubt anyone knows why each of those indices are there. I know that the Drupal community has been responsive to database performance issues in the past and I am confident that it will tackle slow page execution problems as they arise. I am looking for a more thorough analysis of actual runtime on which to make recommended changes. For example, I'd like to know what the slowest 5% page execution times are. I think having tools to know this is important for a good design of the database, regardless of whether it is a 'problem' or not. At this point, I am just asking questions and sharing what I am learning. I'll put together a table schema for devel.module and batch update code next week. Cheers, Kieran
participants (3)
-
Craig Courtney -
Kieran Lal -
Moshe Weitzman