[drupal-devel] [bug] apache and drupal both compressing pages

Arto drupal-devel at drupal.org
Fri Aug 5 13:46:12 UTC 2005


Issue status update for 
http://drupal.org/node/13224
Post a follow up: 
http://drupal.org/project/comments/add/13224

 Project:      Drupal
 Version:      cvs
 Component:    base system
 Category:     bug reports
 Priority:     critical
 Assigned to:  killes at www.drop.org
 Reported by:  bertboerland at www.drop.org
 Updated by:   Arto
 Status:       patch (code needs review)

If the caching mechanism is to be patched, that'd be a good time to also
tackle the problem that it's currently unusable on PostgreSQL:
http://drupal.org/node/26369




Arto



Previous comments:
------------------------------------------------------------------------

Sat, 20 Nov 2004 16:22:55 +0000 : bertboerland at www.drop.org

hi


this weekend I went ahead and upgraded drupal from 4.4 to 4.5 on a
machine, I dint have time/priority (in Dutch these words are almost the
same :-) sooner.


The core went fine and as usual lots of unpacking, configuring for all
the modules I want to experiment with. After some time I tried to
access the site in a -just-to-be-sure- moment from windows machine with
an IE browser without being logged in. I found some strange behavior.
Sometimes IE wanted to download the file, sometimes it showed the file
(being a binary) and sometimes I got a valid page. I never got this
from my firefox on both windows and Linux while being logged in. Later
I could reproduce this on a firefox browser without being logged in.


After some time / thinking I think I found out that both apache (my
webserver) and drupal were compressing (gzipping) the page resulting in
a browser that was able to uncompress the page once. But if it was a
cached page and drupal was zipping this and Apache as well, the browser
was just displaying a gzipped page.


I have a rather old setup with an ancient version of apache but does
this sound possible? Since I am hosting (some) other content outside
the drupal dir, I would like to keep apache gzipping pages so is there
an option to stop drupal from compressing pages?




------------------------------------------------------------------------

Sat, 20 Nov 2004 16:32:16 +0000 : killes at www.drop.org

Which method use your Apache to gzip the pages? Which version is it?
I find it strange that we get this report only several weeks after the
release. I suspect that your particualr set up is borken.




------------------------------------------------------------------------

Sat, 20 Nov 2004 16:45:23 +0000 : bertboerland at www.drop.org

The reason I waited so long was that I wanted others to find out this
:-) Mind you, I *think* it is the fact that they are both compressing
the cached pages. You might want to try it out for yourself by going to
http://willy.boerland.com/myblog and in case a there is a cached page
you will end up by downloading a binary that once renamed to a gz and
unpacked, will give you the html file.


drupal 4.5
apache: 1.3.26 (with *all* security patches)
php: 4.2.3
Content-Encoding: gzip


And yes, since I quess 100+ sites are already at 4.5 I think it is
related to my setup. However, I understand correctly there i no way of
having drupal stop sending out compressed pages? There is no
configuration option?




------------------------------------------------------------------------

Sat, 20 Nov 2004 17:06:05 +0000 : killes at www.drop.org

Yes, I get funny garbage, too.
The apache version you run is the one Debian provides. It is still used
very often, but still nobody but you seems to experience the problem, so
I still think that somehow your setup is broken. Apache should recognize
that the page is already gzipped. Do you use mod_gzip or libz?




------------------------------------------------------------------------

Sat, 20 Nov 2004 17:57:14 +0000 : bertboerland at www.drop.org

Since this one is more or less critical for me but not for the RotW,
re-setting priority to "nomal". I am using zlib btw. Did you try to
save the file and unzip it? And I am correct that there is no option to
disable cached paged from being zipped?




------------------------------------------------------------------------

Sat, 20 Nov 2004 18:30:23 +0000 : bertboerland at www.drop.org

I think I "solved" this for me by changing the bootstrap.inc


setting this to postponed, if no-one else experiences this problem
within 3 weeks I will close this issue.




------------------------------------------------------------------------

Sat, 20 Nov 2004 18:31:19 +0000 : killes at www.drop.org

I tried to download it now, but you must have changed something, the
page renders fine.
You are right, there is no option to disable gzipped cache. I'd really
like to discover the reason for this problem. Normally, zlib should
check for gzipped content and do not gzip it again. Drupal sends the
appropriate header.




------------------------------------------------------------------------

Sat, 20 Nov 2004 18:32:25 +0000 : killes at www.drop.org

Locking could have prevented this. ;)




------------------------------------------------------------------------

Sun, 21 Nov 2004 12:36:23 +0000 : axel at drupal.ru

> I found some strange behavior. Sometimes IE wanted to download the
file, sometimes it showed the file (being a binary) and sometimes I got
a valid page.


Sometimes I got same thing in Mozilla Firefox with differrent sites
(not sure, but not only with Drupal) when I work trough my local
proxies (I use chain of two proxies on my Debian box - one for banner
deleting (privoxy) and second for offline page caching (wwwoffle)). I
don't dig to details of such behaviour, but simply switch off proxies
and reload browser's page help to solve problem - then I again switch
on proxies and page loaded ok.


May be you also access site through proxy? Then you need to check this
proxy settings, I think.




------------------------------------------------------------------------

Sun, 21 Nov 2004 21:13:26 +0000 : bertboerland at www.drop.org

Axel,


no it was not related to proxys or even the browser. It was related to
cached pages being zipped by drupal and sent to apache that was ziping
the page as well, resulting in a zipped zipped page for the client that
only once unzipped the page and hence displayed a binary




------------------------------------------------------------------------

Tue, 21 Dec 2004 17:10:12 +0000 : bertboerland at www.drop.org

It seems like I was not the only one [1]. There is not enough
information about the others setup (/please post here!/) to find a
generic cause, but it is sure that some setups will cause zipped pages
to be zipped.
[1] http://drupal.org/node/14569




------------------------------------------------------------------------

Thu, 03 Feb 2005 15:03:17 +0000 : leafish_dylan

Has anybody found a better way to fix this problem? Disabling the
storage of compressed cache pages in bootstrap.inc works, but it's
ugly.


Is an uncompressed cache likely to take up a lot of space in the
database? If not, perhaps a global "gzip/zlib compression" option for
Drupal would be simpler.




------------------------------------------------------------------------

Sat, 05 Feb 2005 14:40:16 +0000 : Anonymous

As author of the gzipped cache patch I'd really appreciate if the people
who experience this problem could supply detailed(!) info on their setup
and the headers sent both by the browser and the server. I'd like to fix
the problem (if there is one on Drupal's side) or at least document
correct server settings if the problem is there.




------------------------------------------------------------------------

Sat, 05 Feb 2005 16:54:42 +0000 : bertboerland at www.drop.org

Here some info, please ask if you need more:


apache 1.3.26
mmcache (version unknown)
drupal 4.5.1


Note: I didnt post (and thought abaout!) the MMcache between  drupal
and apache.


The configuration of mmcache is
extension="mmcache.so"
mmcache.shm_size="16"
mmcache.cache_dir="/tmp/mmcache"
mmcache.enable="1"
mmcache.optimizer="1"
mmcache.check_mtime="1"
mmcache.debug="0"
mmcache.filter=""
mmcache.shm_max="0"
mmcache.shm_ttl="0"
mmcache.shm_prune_period="0"
mmcache.shm_only="0"
mmcache.compress="1"


headers:
http://willy.boerland.com/myblog/


GET /myblog/ HTTP/1.1
Host: willy.boerland.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)
Gecko/20041107 Firefox/1.0
Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: PHPSESSID=307f07c169e3bbcbe92d150026c97583


HTTP/1.x 200 OK
Date: Sat, 05 Feb 2005 16:45:49 GMT
Server: Apache-AdvancedExtranetServer/1.3.26
X-Powered-By: PHP/4.2.3
Content-Encoding: gzip
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
----------------------------------------------------------




------------------------------------------------------------------------

Mon, 21 Feb 2005 10:53:02 +0000 : coma

I have exactly the same problem my setup it's currently apache 2.0.52,
php 4.3.10 and drupal 4.5.2.


The relevant info it's:


from apache:
<Location />
 # Insert filter
 SetOutputFilter DEFLATE
 # Netscape 4.x has some problems...
 BrowserMatch ^Mozilla/4 gzip-only-text/html
 # Netscape 4.06-4.08 have some more problems
 BrowserMatch ^Mozilla/4\.0[678] no-gzip
 # MSIE masquerades as Netscape, but it is fine
 BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
 # Don't compress images
 SetEnvIfNoCase Request_URI \
        \.(?:gif|jpe?g|png)$ no-gzip dont-vary
 # Make sure proxies don't deliver the wrong content
 Header append Vary User-Agent env=!dont-vary
</Location>


In php.ini:
zlib.output_compression = On


And I can assure you that my problem was that last element, changing
zlib.output_compression to off solved the problem, and the output of
all php scripts (not only drupal) it's compressed by either drupal or
apache.


You can test this things with wget:


wget --header "Accept-Encoding: gzip,deflate" http://localhost/


The file wget drops should be a gzippped html, if you do a zcat and get
a plain html it's all good, if you get another gzipped file it has been
double compressed.




------------------------------------------------------------------------

Sat, 02 Apr 2005 19:38:42 +0000 : moshe weitzman

i too am seeing strange behavior here. specifically, php is simply
dieing after printing out the $cache->data in drupal_page_header().
truly, i don't think drupal should be storing gzipped cache, nor do i
think it should perform gzip at all. these are better handled at php or
apache layers.




------------------------------------------------------------------------

Sat, 23 Apr 2005 14:06:14 +0000 : mjr

Apparently i am observing a new variant of that problem in one of the
subsites of my 4.6 test installation: after enebaling cacheing, lynx,
w3m and the gecko based browsers display garbage. lynx tells me it
wants to load a file named index.html.gz.


wget http://beate/drupal46/  on that site gives a readable html page.


w3m -dump_head tells me:


Received cookie: PHPSESSID=a6916d57d08727ab8ed945ddac4ecf2c
HTTP/1.1 200 OK
Date: Sat, 23 Apr 2005 13:56:19 GMT
Server: Apache/2.0.53 (Debian GNU/Linux) PHP/4.3.10-10 mod_ssl/2.0.53
OpenSSL/0.9.7e mod_perl/1.999.20 Perl/v5.8.4
X-Powered-By: PHP/4.3.10-10
Set-Cookie: PHPSESSID=a6916d57d08727ab8ed945ddac4ecf2c; expires=Mon, 16
May 2005 17:29:39 GMT; path=/
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Last-Modified: Sat, 23 Apr 2005 13:56:19 GMT
Cache-Control: no-store, no-cache, must-revalidate
Cache-Control: post-check=0, pre-check=0
Pragma: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Connection: close
Content-Type: text/html; charset=utf-8


Apparently, the site sends uncompressed pages but tells the browser it
is compressed.
BTW, links2 and dillo do disply the page but will not allow me to log
in and disable cacheing.
Any clues?


Thanks


Michael




------------------------------------------------------------------------

Sat, 23 Apr 2005 14:08:42 +0000 : mjr

Perhaps I should mention that
zlib.output_compression = Off
in my php.ini




------------------------------------------------------------------------

Sat, 23 Apr 2005 18:43:30 +0000 : bertboerland at www.drop.org

/Apparently i am observing a new variant of that problem /


Please open a new bug report, since your problem is likely unrelated to
this problem, open an new problem and you might want to use a link to
this problem, but dont use this problem for other than descibed.




------------------------------------------------------------------------

Sat, 04 Jun 2005 23:53:22 +0000 : mjr

no, it is not unrelated. After some investigation, i found out that the
garbage is being stored in drupals cache. This has been reported in
this thread.
My present workaround is to completly disable drupals caching (and to
manually delete the compressed pages from the database.)


Whats the use of letting drupal compress pages and store these int its
cache table? Just saving a few cycles of CPU time? On the cost of
letting the web server and its client decide wether to transfer
compressed or uncompressed data - they have the protocol to do that,
drupal has not.


Michael




------------------------------------------------------------------------

Mon, 06 Jun 2005 16:48:30 +0000 : killes at www.drop.org

We save cpu cycles and storage space. Apache only needs to check the
Encoding header which will be easy. I do not know why zlib output
compression doesn't check those headers for some (or all?) people.




------------------------------------------------------------------------

Fri, 10 Jun 2005 23:32:40 +0000 : bluec

I have the same problem with Drupal 4.6.1: If the site delivers a cached
page it is gziped twice and the result is garbage in my browser window.
I tried two differnet browsers: Firefox and Opera and it's the same
problem with both of them.


I recorded the HTTP headers:


- Cache turned on
- zlib-compression turned on
- Result: Garbage


http://kim.sshtun.v63.org:22003/index.html
GET /index.html HTTP/1.1
Host: kim.sshtun.v63.org:22003
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8)
Gecko/20050511 Firefox/1.0.4
Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: de-de,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://kim.sshtun.v63.org:22003/fachschaft/
Cookie: PHPSESSID=6ddbf3d5a32d-CHANGED-BY-CHRIS
HTTP/1.x 200 OK
Date: Fri, 10 Jun 2005 23:16:08 GMT
Server: Apache/2.0.54 (Gentoo/Linux) mod_ssl/2.0.54 OpenSSL/0.9.7e
PHP/5.0.4
X-Powered-By: PHP/5.0.4
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Last-Modified: Fri, 10 Jun 2005 23:11:59 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
Pragma: no-cache
Set-Cookie: PHPSESSID=6ddbf3d5a32d20-CHANGED-BY-CHRIS; expires=Mon, 04
Jul 2005 02:49:28 GMT; path=/
Etag: "03cff5fcc105e72dfc2671d3ffdb285c"
Content-Encoding: gzip
Content-Length: 5024
Keep-Alive: timeout=15, max=95
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8


Now I turned off compression by setting in my .htaccess


  php_flag zlib.output_compression On


and the result is OK:


http://kim.sshtun.v63.org:22003/index.html
GET /index.html HTTP/1.1
Host: kim.sshtun.v63.org:22003
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8)
Gecko/20050511 Firefox/1.0.4
Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: de-de,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://kim.sshtun.v63.org:22003/index.html
Cookie: PHPSESSID=6ddbf3d5a32d20-CHANGED-BY-CHRIS
HTTP/1.x 200 OK
Date: Fri, 10 Jun 2005 23:17:08 GMT
Server: Apache/2.0.54 (Gentoo/Linux) mod_ssl/2.0.54 OpenSSL/0.9.7e
PHP/5.0.4
X-Powered-By: PHP/5.0.4
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Last-Modified: Fri, 10 Jun 2005 23:11:59 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
Pragma: no-cache
Set-Cookie: PHPSESSID=6ddbf3d5a32d208-CHANGED-BY-CHRIS; expires=Mon, 04
Jul 2005 02:50:28 GMT; path=/
Etag: "03cff5fcc105e72dfc2671d3ffdb285c"
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 5054
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8


Turning off the cache is a solution as well. As soon as cache and zlib
are operating the result is garbage.


If you need any other information just let me know. I can turn on
compression at any time and test other solutions.


Chris




------------------------------------------------------------------------

Mon, 27 Jun 2005 18:10:12 +0000 : aries

The same. Interesting, only one page doing this, all others are not.


--
Aries
http://aries.mindworks.hu




------------------------------------------------------------------------

Thu, 07 Jul 2005 14:46:34 +0000 : fago

quite strange.
i never had problems with this, then i upgraded from 4.6.1 to drupal
4.6.2... now i must turn off the cache, or i've the troubles mentioned
here... (i haven't touched the apache or php config in the meanwhile)




------------------------------------------------------------------------

Thu, 14 Jul 2005 09:47:32 +0000 : bertboerland at www.drop.org

yet another one [2], closed than one, raised importance here.
[2] http://drupal.org/node/25326




------------------------------------------------------------------------

Sat, 16 Jul 2005 19:52:26 +0000 : bertboerland at www.drop.org

killes sigested that is want apache that compressed the page but drupal
and php was the first one. after some testing, i found out he was
correct! 


to solve this
1) either comment out the gzip lines in bootstrap.inc
2) put /etc/php.ini "zlib.output_compression = On" to OFF


(real) Fix needed. Drupal should look if the page was zipped in the
first place....




------------------------------------------------------------------------

Fri, 29 Jul 2005 12:17:46 +0000 : killes at www.drop.org

Attachment: http://drupal.org/files/issues/gzip.patch (2.45 KB)

Here's a patch that needs some testing.




------------------------------------------------------------------------

Fri, 29 Jul 2005 13:18:51 +0000 : Dries

Why do we set $do_cache to TRUE in the nested if's?  $do_cache is
already set to TRUE higher up so we're just overwriting TRUE with TRUE?


Also, can we rename $do_cache to $cache?  That is more Drupal-ish.


Does that solve all apache-Drupal-caching problems or only some?  I
vaguely remember we agreed to add a setting, because that was perceived
the only True Solution.




------------------------------------------------------------------------

Fri, 29 Jul 2005 18:22:04 +0000 : killes at www.drop.org

Attachment: http://drupal.org/files/issues/gzip_0.patch (2.46 KB)

I've updated the patch.


The patch is intended to cure all the problems we had. It needs
testing, though.
I don't recall to agreeing that a setting would be needed.


Just for the record: Steven measured the ratio of serving gzipped
cached pages to re-unzipped pages on drupal.org to be about 5.1:1. That
is from 6 page requests which we serve from the cache only one needs the
cache to be unzipped. This is probably due to crawlers who only speak
http 1.0.




------------------------------------------------------------------------

Sat, 30 Jul 2005 14:33:42 +0000 : bertboerland at www.drop.org

people who expierenced this problem, please try the patch of killes and
report here if successful or not. i will patch as well




------------------------------------------------------------------------

Mon, 01 Aug 2005 14:55:47 +0000 : moshe weitzman

In my opinion, this is a case of misplaced optimization. Very little CPU
is required to gzip a document. Why else would Apache and PHP provide an
option to do this *for every page*. PHP knows that all its pages will be
dynamic yet it still offers this option.


My recommendation is this feature entirely, and take the win in
simplicity/code reduction. Furthermore, we won't have any more issues
like this one, which has lingered unfixed for 9 months. 


Admins who want gzip will elect to do so in their php.ini or .htaccess.







More information about the drupal-devel mailing list