This is more of an RFC than a DEP, so please forgive the looser format and trademark verbosity. :-) My current push for Drupal 6 is to make it faster for non-opcode non-cached users. Drupal 5 was, according to Dries' benchmarks, a slight step backwards from Drupal 4.7 in that regard, so let's reverse that trend with a vengeance. Far and away the slowest part of the Drupal life-cycle is bootstrapping. According to Rasmus' keynote at Drupalcon, we spend just over 50% of the entire process just pulling code off disk and parsing it. A typical page load, however, uses only a small fraction of that. Thus, the biggest target for optimization is "load less code", but without violating the corollary, "load fewer files". As an example, I recently tried breaking up some core modules and loading page callbacks and form only when needed[1]. Even with that primitive breakup, we were able to get an 8-18% improvement in page load time and a 23% decrease in memory usage. I hate sayings like "the numbers speak for themselves", but in this case they do. On-demand loading of lesser-used module code has the potential to be a huge win, and the extra code required to make it possible is minimal. (The code linked in that issue only adds ~10 lines of code; the rest of the patch is just moving code around.) That of course begs the question, how to split up the code in a module? In general, I see 5 logical divisions of code within a module: 1) Rare hooks. hook_install() and hook_update() are the classic cases here, although I think hook_menu() in Drupal 6 may be moving in that direction. These are hooks called only at very specific, rare times. The other 99% of the time they're dead weight. 2) Common hooks. This is basically every hook that isn't one of the few rare hooks. These may be called at any time, more or less often. (For the time being I am going to lump hook_load(), hook_update(), etc. in here, even though they're technically not hooks as we've discussed previously.) 3) Page handlers. These are functions whose primary purpose in life is to be called from menu_execute_active_handler(). They serve no other serious function. 4) Form builders. These are the form definition, validation, and submission functions, as well as their sub-call helpers. 5) API functions. These are functions specifically exposed to other modules to do stuff, for some definition of stuff. 6 (nobody expects the Spanish inquisition!)) Utility functions. These functions are mostly intended for internal use but can sometimes be useful to other modules. The line between a utility function and an API function is very blurry since PHP functions have no concept of namespace or visibility. There aren't that many Rare Hooks, and most of them are already in .install files so I will ignore those for now. Common Hooks by nature need to be readily-available at any time. It is possible to dynamically load those, too, but that's a more complex issue, and one that merlinofchaos[2] and chx[3] have already started to address. I am therefore not going to deal with those, either. API functions also have to be readily-available, and utility functions probably should, too. So now that we've said we need to load most types of code, what does that leave us? That leaves us the two types of code that are used the least but take up the most lines of code. merlin and webchick recently went through a few core modules and cataloged what functions were of what type[4], and the results are clear: We spend most of our code on page and form handling, and yet only one page is ever handled per page load and, generally, only 1-2 forms! In terms of actual lines of code, all four modules in question (system, user, comment, block) are majority pages and forms, in some cases by over 2/3. That means page handlers and form handlers are the safest to factor out into separate on-demand files but also the biggest win from doing so. It's nice how that works out. So now we need a mechanism for on-demand loading of page handlers and form handlers, subject to the following conditions: 1) It should be an optional optimization. We don't want to force all modules to break up, because many, I'd say the majority, are small and simple enough that it would be a case of over-optimization to require, say, every form to be in a separate file or every module to have a .pages file. We also don't want to make module authoring an overly-difficult process with a dozen magic files. The degenerate case should be exactly how things work now. 2) It should be flexible. Different modules need to be optimized differently. Putting all page handlers into a single .pages file for a module could still mean loading 10x as much code as we really need. Module authors need to be able to factor their own modules in the way that makes the most sense for that module, which could mean one on-demand file or several. 3) It is impossible to determine the module that provides a function from the function name alone. Sure all functions (should) use $modulename_<something> as their format, but many modules have an underscore in their name. Given a function named "foo_bar_baz", is that the "bar_baz" function of the "foo" module, or the "baz" function of the "foo_bar" module? We can't tell. Therefore, unless we are going to simply exclude modules with such names from this system (and I think that's a really bad idea) we will have to explicitly specify the module or path for a given auxiliary file. 4) Modules may call page handlers and form handlers from other modules. Core does this in places (node.module calls a page handler from system.module, for instance) as do various contribs, so we can't assume that the calling module is the providing module. 4) Page handlers are called from the menu system; therefore, the logical place to decide if additional code is needed is the menu system. Since we can't presume or deduce a module from the handler, that means it has to be specified explicitly in hook_menu(). 5) Form handlers are called from drupal_get_form(), or from drupal_execute(). Many are parameters to drupal_get_form() being used as a page handler, but not all. drupal_execute() may be called from anywhere at any time, too, so forms need to be either already loaded or loadable on-demand at any time. I therefore propose (finally I get to this part!) to split off page handlers and form handlers in similar ways. == Page Handlers == Only one page handler is called per page load, so we only need to worry about a page handler becoming available in menu_execute_active_handler(). Modules provide information to the menu system via hook_menu(), so each menu item can optionally specify information on what file to load in order to make the handler available. That could be one of two ways: Pass a full path (eg, drupal_get_path('module', 'foo') . '/foo.pages.inc' ) or specify a file name and module name separately. For simple flexibility I favor the former. It's simple and effective and works for cross-module calls. It's also what's already implemented in the patch I mentioned earlier[1]. == Form Handlers == Forms are nearly always accessed via drupal_get_form() or drupal_execute(). We can therefore do the same sort of centralized improvement for the form system in those functions as we can for page handlers using menu_execute_active_handler(). That is, add a key to hook_forms() to specify a file in which the form lives. Here we can safely presume that the module implementing hook_forms() is also the home of the form functions in question, so we need specify only a file and not a module or path. If a module author wishes to split off one or more forms to another file, hook_forms() becomes a requirement just as it does for specifying an alternate callback function. drupal_get_form() and drupal_execute() then simply check for the existence of that key and include_one() the file if necessary. The total code involved should, like the page handler, be quite limited. Note: I will likely want to wait on implementing the forms part until the FAPI 3 patch lands, because I really don't want to tangle with both eaton and chx on that. :-) The nice thing about this approach, too, is that it doesn't have to be implemented all at once. Because the degenerate case still works, the initial implementation can work on only one or two core modules as a demonstration. The rest of core can be optimized module-at-a-time. That makes the patch easier to review as well as easier to maintain with the rest of core still being actively developed. Given the benchmarks that merlin found with the initial attempt, I'd say whatever the total performance gain is it should be substantial. To the potential problem of module authors "over-factoring" and hiding useful utility functions in a page handler when they shouldn't, I believe that really is solved by best practice guidelines. As a worst-case, a module author can manually include_once() a file out of another module's directory at no worse a cost than the extra parse time. It's still a net-win overall since even if one module gets sloppy-loaded the rest of the system is still well-factored, so there's still a net-reduction in the amount of code involved. Sooo... Now that the three of you who made it all the way through this email have gotten here, thoughts on this approach? Any caveats I'm missing? Any use cases I don't know about? Does this have a snowball's chance in hell of being accepted? <dons flame-retardant suit> [1] http://drupal.org/node/140218 [2] http://drupal.org/node/116165 [3] http://drupal.org/node/140218#comment-236614 [4] http://drupal.org/node/116165#comment-229856 -- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson