[development] The Drupal Diet - Making bootstrap faster
larry at garfieldtech.com
Thu May 3 04:13:04 UTC 2007
This is more of an RFC than a DEP, so please forgive the looser format and
trademark verbosity. :-)
My current push for Drupal 6 is to make it faster for non-opcode non-cached
users. Drupal 5 was, according to Dries' benchmarks, a slight step backwards
from Drupal 4.7 in that regard, so let's reverse that trend with a vengeance.
Far and away the slowest part of the Drupal life-cycle is bootstrapping.
According to Rasmus' keynote at Drupalcon, we spend just over 50% of the
entire process just pulling code off disk and parsing it. A typical page
load, however, uses only a small fraction of that. Thus, the biggest target
for optimization is "load less code", but without violating the
corollary, "load fewer files".
As an example, I recently tried breaking up some core modules and loading page
callbacks and form only when needed. Even with that primitive breakup, we
were able to get an 8-18% improvement in page load time and a 23% decrease in
memory usage. I hate sayings like "the numbers speak for themselves", but in
this case they do. On-demand loading of lesser-used module code has the
potential to be a huge win, and the extra code required to make it possible
is minimal. (The code linked in that issue only adds ~10 lines of code; the
rest of the patch is just moving code around.)
That of course begs the question, how to split up the code in a module? In
general, I see 5 logical divisions of code within a module:
1) Rare hooks. hook_install() and hook_update() are the classic cases here,
although I think hook_menu() in Drupal 6 may be moving in that direction.
These are hooks called only at very specific, rare times. The other 99% of
the time they're dead weight.
2) Common hooks. This is basically every hook that isn't one of the few rare
hooks. These may be called at any time, more or less often. (For the time
being I am going to lump hook_load(), hook_update(), etc. in here, even
though they're technically not hooks as we've discussed previously.)
3) Page handlers. These are functions whose primary purpose in life is to be
called from menu_execute_active_handler(). They serve no other serious
4) Form builders. These are the form definition, validation, and submission
functions, as well as their sub-call helpers.
5) API functions. These are functions specifically exposed to other modules
to do stuff, for some definition of stuff.
6 (nobody expects the Spanish inquisition!)) Utility functions. These
functions are mostly intended for internal use but can sometimes be useful to
other modules. The line between a utility function and an API function is
very blurry since PHP functions have no concept of namespace or visibility.
There aren't that many Rare Hooks, and most of them are already in .install
files so I will ignore those for now. Common Hooks by nature need to be
readily-available at any time. It is possible to dynamically load those,
too, but that's a more complex issue, and one that merlinofchaos and
chx have already started to address. I am therefore not going to deal
with those, either. API functions also have to be readily-available, and
utility functions probably should, too.
So now that we've said we need to load most types of code, what does that
leave us? That leaves us the two types of code that are used the least but
take up the most lines of code. merlin and webchick recently went through a
few core modules and cataloged what functions were of what type, and the
results are clear: We spend most of our code on page and form handling, and
yet only one page is ever handled per page load and, generally, only 1-2
forms! In terms of actual lines of code, all four modules in question
(system, user, comment, block) are majority pages and forms, in some cases by
over 2/3. That means page handlers and form handlers are the safest to
factor out into separate on-demand files but also the biggest win from doing
so. It's nice how that works out.
So now we need a mechanism for on-demand loading of page handlers and form
handlers, subject to the following conditions:
1) It should be an optional optimization. We don't want to force all modules
to break up, because many, I'd say the majority, are small and simple enough
that it would be a case of over-optimization to require, say, every form to
be in a separate file or every module to have a .pages file. We also don't
want to make module authoring an overly-difficult process with a dozen magic
files. The degenerate case should be exactly how things work now.
2) It should be flexible. Different modules need to be optimized differently.
Putting all page handlers into a single .pages file for a module could still
mean loading 10x as much code as we really need. Module authors need to be
able to factor their own modules in the way that makes the most sense for
that module, which could mean one on-demand file or several.
3) It is impossible to determine the module that provides a function from the
function name alone. Sure all functions (should) use $modulename_<something>
as their format, but many modules have an underscore in their name. Given a
function named "foo_bar_baz", is that the "bar_baz" function of the "foo"
module, or the "baz" function of the "foo_bar" module? We can't tell.
Therefore, unless we are going to simply exclude modules with such names from
this system (and I think that's a really bad idea) we will have to explicitly
specify the module or path for a given auxiliary file.
4) Modules may call page handlers and form handlers from other modules. Core
does this in places (node.module calls a page handler from system.module, for
instance) as do various contribs, so we can't assume that the calling module
is the providing module.
4) Page handlers are called from the menu system; therefore, the logical place
to decide if additional code is needed is the menu system. Since we can't
presume or deduce a module from the handler, that means it has to be
specified explicitly in hook_menu().
5) Form handlers are called from drupal_get_form(), or from drupal_execute().
Many are parameters to drupal_get_form() being used as a page handler, but
not all. drupal_execute() may be called from anywhere at any time, too, so
forms need to be either already loaded or loadable on-demand at any time.
I therefore propose (finally I get to this part!) to split off page handlers
and form handlers in similar ways.
== Page Handlers ==
Only one page handler is called per page load, so we only need to worry about
a page handler becoming available in menu_execute_active_handler(). Modules
provide information to the menu system via hook_menu(), so each menu item can
optionally specify information on what file to load in order to make the
handler available. That could be one of two ways: Pass a full path (eg,
drupal_get_path('module', 'foo') . '/foo.pages.inc' ) or specify a file name
and module name separately. For simple flexibility I favor the former. It's
simple and effective and works for cross-module calls. It's also what's
already implemented in the patch I mentioned earlier.
== Form Handlers ==
Forms are nearly always accessed via drupal_get_form() or drupal_execute().
We can therefore do the same sort of centralized improvement for the form
system in those functions as we can for page handlers using
menu_execute_active_handler(). That is, add a key to hook_forms() to specify
a file in which the form lives. Here we can safely presume that the module
implementing hook_forms() is also the home of the form functions in question,
so we need specify only a file and not a module or path. If a module author
wishes to split off one or more forms to another file, hook_forms() becomes a
requirement just as it does for specifying an alternate callback function.
drupal_get_form() and drupal_execute() then simply check for the existence of
that key and include_one() the file if necessary. The total code involved
should, like the page handler, be quite limited.
Note: I will likely want to wait on implementing the forms part until the FAPI
3 patch lands, because I really don't want to tangle with both eaton and chx
on that. :-)
The nice thing about this approach, too, is that it doesn't have to be
implemented all at once. Because the degenerate case still works, the
initial implementation can work on only one or two core modules as a
demonstration. The rest of core can be optimized module-at-a-time. That
makes the patch easier to review as well as easier to maintain with the rest
of core still being actively developed. Given the benchmarks that merlin
found with the initial attempt, I'd say whatever the total performance gain
is it should be substantial.
To the potential problem of module authors "over-factoring" and hiding useful
utility functions in a page handler when they shouldn't, I believe that
really is solved by best practice guidelines. As a worst-case, a module
author can manually include_once() a file out of another module's directory
at no worse a cost than the extra parse time. It's still a net-win overall
since even if one module gets sloppy-loaded the rest of the system is still
well-factored, so there's still a net-reduction in the amount of code
Sooo... Now that the three of you who made it all the way through this email
have gotten here, thoughts on this approach? Any caveats I'm missing? Any
use cases I don't know about? Does this have a snowball's chance in hell of
<dons flame-retardant suit>
Larry Garfield AIM: LOLG42
larry at garfieldtech.com ICQ: 6817012
"If nature has made any one thing less susceptible than all others of
exclusive property, it is the action of the thinking power called an idea,
which an individual may exclusively possess as long as he keeps it to
himself; but the moment it is divulged, it forces itself into the possession
of every one, and the receiver cannot dispossess himself of it." -- Thomas
More information about the development