[drupal-devel] performance improvements - avoiding big unserialize()
In speaking with Rasmus last night, he discouraged us from using unserialize() on large arrays when possible. This eats memory and speed. We do this at lkeast twice on evrry page view: the menu cache and the variables cache (i.e. $conf). Rasmus' suggestion was to store these arrays without serialization in a shared memory system like APC. I have a long term plan to add *optional* shared memory storage for these arrays and maybe others as they arise. Perhaps someone else wants to move this forward quickly. Comments welcome.
On 10/21/05 9:10 AM, Moshe Weitzman wrote:
In speaking with Rasmus last night, he discouraged us from using unserialize() on large arrays when possible. This eats memory and speed. We do this at lkeast twice on evrry page view: the menu cache and the variables cache (i.e. $conf).
Rasmus' suggestion was to store these arrays without serialization in a shared memory system like APC. I have a long term plan to add *optional* shared memory storage for these arrays and maybe others as they arise. Perhaps someone else wants to move this forward quickly. Comments welcome.
well, my experiences with APC have been pretty mixed - seems to work really well for a while. However, I'm *all* for promoting some further collaboration with Rasmus... and yeah APC would probably be much better for this stuff (or memcache which has a PECL extension http://pecl.php.net/package/memcache as well). the big issue, though, is to have something (maybe just status quo?) for folks that don't have a shared memory option (i.e. typical hosting environments) - the good news is they're the ones that don't usually run into performance issues. all that to say +1 :) -- James Walker :: http://walkah.net/
Hi, why don't we use the var_export() function [1] and save the settings to a regular file? And for versions prior to PHP 4.2.0 we could write a workaround function - at least it seams pretty simple to implement. Regards, Konstantin Käfer [1]: http://php.net/manual/en/function.var-export.php
Konstantin Käfer wrote:
Hi,
why don't we use the var_export() function [1] and save the settings to a regular file? And for versions prior to PHP 4.2.0 we could write a workaround function - at least it seams pretty simple to implement.
Regards, Konstantin Käfer
I'd like to see some benchmarks on this approach vs. current serialization technique.
Konstantin Käfer wrote:
Hi,
why don't we use the var_export() function [1] and save the settings to a regular file? And for versions prior to PHP 4.2.0 we could write a workaround function - at least it seams pretty simple to implement.
Regards, Konstantin Käfer
Looks like a much better solution than using APC (for portability reasons) or serialization (for performance reasons). -- Bertrand Mansion http://www.mamasam.com - creative internet solutions http://golgote.freeflux.net - my blog
Hello, I just took a short and tiny benchmark with a multidimensional array containing 65 KB of data stored in 2048 different values. It took PHP on my machine about 0.01 seconds (average value after 256 rounds) to build the source for the array and write it to a file. And loading the file is extremely fast anyway as we can simply include it. I saw that PHP_Compat from PEAR [1] has an emulation function in their library. Regards, Konstantin [1]: http://pear.php.net/package/PHP_Compat
Potential problems: 1. This file will have to be writeable by the web server. 2. This file will be writeable by the web server On 10/21/05, Konstantin Käfer <kkaefer@gmail.com> wrote:
Hello,
I just took a short and tiny benchmark with a multidimensional array containing 65 KB of data stored in 2048 different values. It took PHP on my machine about 0.01 seconds (average value after 256 rounds) to build the source for the array and write it to a file. And loading the file is extremely fast anyway as we can simply include it.
I saw that PHP_Compat from PEAR [1] has an emulation function in their library.
Regards, Konstantin
-- Best regards, Herman Webley
Using file system to store read/write information like this will be a severe security risk for most hosting situations. As Herman pointed out the user id that the php engine runs under (may be different than web server when php is run as cgi) will have to have read/write access to the file data store. In many shared hosting this means that your information file will be read/writable for all other sites hosted on your box. Craig On Oct 21, 2005, at 2:33 PM, Herman Webley wrote:
1. This file will have to be writeable by the web server. 2. This file will be writeable by the web server
The problem, however, is multi-server setups like drupal.org. How does one maintain consistent caches across different servers (unless you use a distributed cache like memcached or ... a database cache like we do now)? if (multi-server setup) { 1. use memcached or 2. use a shared database cache or 3. invent a cache-coherency protocol? } else if (APC) { 4. use APC } -- Dries Buytaert :: http://www.buytaert.net/
Maybe this is implicitly meant by Dries, but just to spell it out, the pseudo code should be modified to read: if (multi-server setup) { 1. use memcached or 2. use a shared database cache or 3. invent a cache-coherency protocol? } else if (APC) { 4. use APC } else if (on shared hosting) { 5. Use existing cache or no caching at all } On 10/22/05, Dries Buytaert <dries@buytaert.net> wrote:
The problem, however, is multi-server setups like drupal.org. How does one maintain consistent caches across different servers (unless you use a distributed cache like memcached or ... a database cache like we do now)?
if (multi-server setup) { 1. use memcached or 2. use a shared database cache or 3. invent a cache-coherency protocol? } else if (APC) { 4. use APC }
-- Dries Buytaert :: http://www.buytaert.net/
Dries Buytaert wrote:
The problem, however, is multi-server setups like drupal.org. How does one maintain consistent caches across different servers (unless you use a distributed cache like memcached or ... a database cache like we do now)?
Indeed. Anyone have a suggestion for how a site can discover if it is a member of a multi-site host?
if (multi-server setup) { 1. use memcached or 2. use a shared database cache or 3. invent a cache-coherency protocol? } else if (APC) { 4. use APC }
i can't think of a better approach than this one. unfortunately we have to introduce a bit of complexity to properly address the problem.
On Tue, 25 Oct 2005, Moshe Weitzman wrote:
Dries Buytaert wrote:
The problem, however, is multi-server setups like drupal.org. How does one maintain consistent caches across different servers (unless you use a distributed cache like memcached or ... a database cache like we do now)?
Indeed. Anyone have a suggestion for how a site can discover if it is a member of a multi-site host?
If we cannot find a better idea we might set a global variable in settings.php.
if (multi-server setup) { 1. use memcached or 2. use a shared database cache or 3. invent a cache-coherency protocol? } else if (APC) { 4. use APC }
i can't think of a better approach than this one. unfortunately we have to introduce a bit of complexity to properly address the problem.
Ideally this would be handled by the cache API. Cheers, Gerhard
why don't we use the var_export() function [1] and save the settings to a regular file? And for versions prior to PHP 4.2.0 we could write a workaround function - at least it seams pretty simple to implement.
What about security? Do you think, it is correct to include some code which was created world writable? Or to eval some code in case of a database source? Goba
At 3:10 PM +0200 21/10/05, Moshe Weitzman wrote:
In speaking with Rasmus last night, he discouraged us from using unserialize() on large arrays when possible. This eats memory and speed. We do this at lkeast twice on evrry page view: the menu cache and the variables cache (i.e. $conf).
Rasmus' suggestion was to store these arrays without serialization in a shared memory system like APC.
I have set up and run some tests comparing various ways of retrieving an array from storage. The storage mechanisms I have tested are: Storing serialized data in a file, Storing var_export output in a file, Storing the array in APC. I haven't tested storing the serialized array in a database because that would just be testing the database performance, whereas I'm primarily interested in testing unserialize. With each test I verified how much memory was available to the script by allocating memory to exhaustion as the last step. This confirmed the summary results below. As I understand it, Rasmus said at DrupalCon that unserialize allocates and does not release a certain amount of memory. I don't see any evidence of this. I do however see a massive memory leak in apc_fetch! It seems to lose about 7 kB of memory each time it's called. I wouldn't like to roll out a system which uses APC in its current state, and I have disabled it on my server. It is interesting to note that the only way to benefit from using APC is to store the array in a file and include it. Not worth the security risk, IMHO. My conclusion is that using unserialize is quite OK and there is no need or benefit to changing the way arrays are stored. The test script can be downloaded from: http://mel01.juggernaut.com.au/arraytest.php.gz Here are the results of my tests. Reading serialized data from a file with fread and then unserializing it: time: 0.5022668838501 seconds elapsed for 100 iterations. memory: 120808 bytes consumed. Reading serialized data from a file with implode('', file()) and then unserializing it: time: 0.52363586425781 seconds elapsed for 100 iterations. memory: 118408 bytes consumed. Including var_export output from a file, APC disabled: time: 1.392637014389 seconds elapsed for 100 iterations. memory: 122096 bytes consumed. Including var_export output from a file, APC enabled: time: 0.33631801605225 seconds elapsed for 100 iterations. memory: 121640 bytes consumed. Fetching data from APC: time: 0.48792600631714 seconds elapsed for 100 iterations. memory: 836016 bytes consumed. Test system is: Apache/2.0.53 (Fedora) PHP Version 4.3.11 APC Version 3.0.8 ...Richard.
thanks for providing these benchmarks, richard.
It is interesting to note that the only way to benefit from using APC is to store the array in a file and include it. Not worth the security risk, IMHO.
i don't understand this. you get no benefit by just storing the array directly? for example, apc_store('conf', $conf);
My conclusion is that using unserialize is quite OK and there is no need or benefit to changing the way arrays are stored.
maybe so. the fact remains that we spend significant time during every request unserializing these large arrays, and if we want to speed up drupal, we have to concentrate in this area. you can verify this using xdebug profiler or zend studio. the profiler does not lie.
Moshe Weitzman wrote:
thanks for providing these benchmarks, richard.
Yep, very interesting!
It is interesting to note that the only way to benefit from using APC is to store the array in a file and include it. Not worth the security risk, IMHO.
i don't understand this. you get no benefit by just storing the array directly? for example, apc_store('conf', $conf);
My conclusion is that using unserialize is quite OK and there is no need or benefit to changing the way arrays are stored.
maybe so. the fact remains that we spend significant time during every request unserializing these large arrays, and if we want to speed up drupal, we have to concentrate in this area.
There are a lot of areas we can look into.
you can verify this using xdebug profiler or zend studio. the profiler does not lie.
I would not be so sure about the latter... I got quite different results (on another topic) using both xdebug and the PEAR profiler. I think that xdebug got it right, but I can't be certain. Cheers, Gerhard
At 12:21 PM -0500 30/11/05, Moshe Weitzman wrote:
i don't understand this. you get no benefit by just storing the array directly? for example, apc_store('conf', $conf);
That's correct, my tests showed no performance gain from using apc_store and apc_fetch. Looking through the APC source it is easy to see why this is the case. APC manipulates the array before storing it, creating an overhead similar to serialize and unserialize. I wonder if the APC authors wouldn't have saved themselves a lot of headaches by just using the PHP serialize and unserialize functions? While I haven't tested memcached, it is unlikely to be any better because it DOES call the PHP serialize and unserialize functions! I added a couple more tests to my script, and it seems unserialize is actually a very efficient way of restoring an array. It is twice as fast as building the array in a for loop. Which isn't surprising since a for loop is interpreted and unserialize is optimized C code. Building the array in a for loop: time: 0.94807291030884 seconds elapsed for 100 iterations. memory: 115920 bytes consumed. Unserialize from an existing global variable (no I/O): time: 0.45300388336182 seconds elapsed for 100 iterations. memory: 76880 bytes consumed. If you doubt my test results, please check out the script. It's not particularly nice code, but I think the results are valid. http://mel01.juggernaut.com.au/arraytest.php.gz Perhaps someone would like to run this test script through a profiler?
maybe so. the fact remains that we spend significant time during every request unserializing these large arrays, and if we want to speed up drupal, we have to concentrate in this area.
Not necessarily. There are lots of areas in Drupal in which performance improvement is possible. The best way to improve performance is to process less data. Smaller data means: - more filesystem buffer hits - more disk cache hits - more database cache hits - less memory moved to unserialize the data - less memory moved to process the data. In this case processing less data could involve: - optimizing the size of the data stored in the arrays - storing less data in the arrays - splitting the arrays so only required portions are retrieved And of course trying to serve more pages from the page cache.
I should add that APC will be included as part of PHP6:
Well, if it's going to be included in core it might receive some more love and attention. I certainly wouldn't leave APC in its current state running on my production server. ...Richard.
Rasmus' suggestion was to store these arrays without serialization in a shared memory system like APC.
I should add that APC will be included as part of PHP6: http://www.php.net/~derick/meeting-notes.html#id60
Rasmus' suggestion was to store these arrays without serialization in a shared memory system like APC.
I should add that APC will be included as part of PHP6: http://www.php.net/~derick/meeting-notes.html#id60
It seems so in this moment in time, yes. But it also seems so that APC will not going to be turned on by default. Goba
participants (12)
-
Bertrand Mansion -
Craig Courtney -
Dries Buytaert -
Gabor Hojtsy -
Gerhard Killesreiter -
Gerhard Killesreiter -
Herman Webley -
James Walker -
Khalid B -
Konstantin Käfer -
Moshe Weitzman -
Richard Archer