[drupal-devel] [feature] Speed up Drupal and improve scalability: On demand module loading

Jose A Reyero drupal-devel at drupal.org
Mon Aug 8 11:56:33 UTC 2005


Issue status update for 
http://drupal.org/node/27901
Post a follow up: 
http://drupal.org/project/comments/add/27901

 Project:      Drupal
 Version:      cvs
 Component:    module system
 Category:     feature requests
 Priority:     normal
 Assigned to:  Anonymous
 Reported by:  Jose A Reyero
 Updated by:   Jose A Reyero
 Status:       patch (code needs review)

Well, we have that two options:
- modules declaring implemented hooks
- modules declaring introduced new hooks, this would affect only a few
modules, like mailhandler, which create their own hooks


Maybe the second could be easier, having i.e. the system.module
declaring all core hooks, then the list can be dinamically built.


May sound funny, but how about a new 'hook_define_hooks' hook?


This list only needs to be built after enabling/disabling modules, just
similar to how the bootstrap hooks are managed, saving the need for that
cron call. This bootstrap hook thing was a big step ahead, so why not do
it for all hooks? 


This patch is only some proof of concept, showing this is not that
complex.




Jose A Reyero



Previous comments:
------------------------------------------------------------------------

Sun, 31 Jul 2005 15:43:13 +0000 : Jose A Reyero

Attachment: http://drupal.org/files/issues/on_demand_module_loading.patch (7.8 KB)

As Drupal grows bigger, there is too much code not really needed parsed
for each request.


Also, installing more and more modules presents serious scalability
issues, as all the enabled modules are included always for each non
cached page.


This is a first attempt to keep track of *all* module hooks, and only
load modules when they're really needed. I guess it may need some
polishing but the idea is simple enough. 


Currently, as the hook_menu is implemented by most of the modules, and
it is called most of the times, the potential performance improvement
introduced by this mechanism may be small. 


But the thing is, once he have some system for on demand module
loading, hooks can be reworked in the future, to have some real
performance boost. I.e. hook_menu could be easily split in two
'hook_menu' and 'hook_menu_dynamic', thus really reducing acually
loaded modules.


As I said, this is a first step. If there's some interest in this kind
of features, I have some other things in the works aimed at performance
and scalability, like:


- (Simple) Rework of menu system to take real advantage of on demand
module loading
- Expand menu items to be able to include some file where the callback
is in
- Extend module loading for 'loading on path', and maybe 'loading split
modules'
- Some API loader, kind of api_invoke....




------------------------------------------------------------------------

Sun, 31 Jul 2005 21:00:41 +0000 : chx

This will be never be really efficient. If you have a block from
aggregator, then aggregator needs to be loaded and parsed and store
despite most of the functionality is never used. Quite a lot of modules
play a small part in most pages.


Alas, my split mode development is halted a bit, but I'll revive. My
problem is that drupal_eval needs on the fly tokenizing and wrapping...




------------------------------------------------------------------------

Mon, 01 Aug 2005 00:10:30 +0000 : eldarin

Interesting issue and approach; I'm also looking into reducing
processing load and speeding things up.
I have completed a first test version of a template system where I can
keep the generated content - without the theme template "XHTML
framework". I am now looking into how to cache partially parsed pages -
with parts like e.g user specific information being always getting
parsed.


It's kind of complicated when to dirty cached contents - perhaps some
simple rules could do - but I've yet to discover which.


In my opinion the biggest gain can be found by reducing whatever
processing needed to deliver XHTML to users - but not all is serving up
the pages - a lot is also logging, AAA etc.


I think a combination of successful caching and reducing module loading
can be the best overall solution.


For security I also like to try and differentiate db users - don't like
the idea of a central "db-root" being used for all accessing. An
encrypted password like /etc/passwd with perhaps client-side browser
encryption generation could serve - a IDEA, 3DES or Rijndael. That
would effectively get rid of most of the hacking/defacing of web-sites
- i.e keep the db intact from hackers.




------------------------------------------------------------------------

Mon, 01 Aug 2005 00:25:19 +0000 : chx

OK, to make this clear: I am working since March (with pauses) on
something called split mode. This splits drupal into a gazillion of
files -- one function, one include file. And loads on demand. The
speedup and the memory saving are enormous (40-50%). What code I have
is in my sandbox since May.


I think it'd be great to have this in 4.7 as a possibility. I hoped
that install system would come along, but as I do not have, I am a bit
reluctant -- the problem is that if you update a module then it won't
work until you resplit which can take several seconds.


However, I think most sites do not update their modules too often so
this is still an avenue that worth pursuing.


Also, as most functions (most == there are problems with references)
are wrapped into a c() call, it'd be possible (later?) to introduce a
mechansim which could override any function. Sometimes this comes up...




------------------------------------------------------------------------

Mon, 01 Aug 2005 12:22:15 +0000 : Jose A Reyero

eldarin,


yes, as I've said this only one of the many things we can do to speed
things up, so the "final solution" could be a combination of on demand
loading, splitting modules, improving the cache... However, all these
can be approached as different features and patches


chx,


I've also tried your 'split' thing, but once I had all that small
files, I couldn't apply the patch, so I dont know really what to do
with it... Anyway, why dont we start another thread for that, as it is
quite a different approach?




------------------------------------------------------------------------

Mon, 08 Aug 2005 03:03:25 +0000 : moshe weitzman

this approach is worth exploring ... you have anarray which defines 'all
hooks used by modules' . perhaps that possible in core, but non core
modules define hooks too (e.g. syndication, mailhandler, ...). they
need a way to register with the module system.




------------------------------------------------------------------------

Mon, 08 Aug 2005 03:28:02 +0000 : lgarfiel

Correct me if I'm wrong, but as I understand it the disk-hit involved in
loading a file is a bigger performance drain than parsing said file,
unless said file is very complicated.  Unless you have a RAM disk or
PHP accelerator (which does RAM caching), wouldn't the trade-off of
then hitting the disk 50 times instead of 10 be net negative?




------------------------------------------------------------------------

Mon, 08 Aug 2005 09:24:27 +0000 : Jose A Reyero

moshe,


yes, you're right, I really hadnt thought of that...  But I had
foreseen some other similar issues with modules implementing hooks in
included files...


But I think this could be handled with some module_info function in
which each module returns information about which hooks it implements.
This 'module_info' hook has been also mentioned in some other thread
about returning version information, and could be used too for
dependencies between modules, etc...


A different approach would be also modules providing information about
which new hooks they introduce, so the module system could search for
that new hooks in all the modules.


I think I'll go for the first one -which may be only for non core
hooks-, maybe also using that function for information about on which
paths the module has to be loaded, thus solving the problem with
dynamic menu items too.


lgarfiel,


I'm not sure about that data about performance but if you're right,
that's one more reason to rework the module loading. I think this
approach means actually less disk hits than current system.
Besides that, more PHP parsed has some important impact in memory use,
and that eventually means  performance too.




------------------------------------------------------------------------

Mon, 08 Aug 2005 11:33:26 +0000 : moshe weitzman

I'm not too fond of module developers having to declare all the hoploks
they are using. The system should be smart enough to handle that. How
about we implement a system_cron() function which loads all modules and
tracks the hooks that they employ. That way, only it has to load
everything, and they regular user requests can use lazy loading as
you've suggested.







More information about the drupal-devel mailing list