[drupal-devel] [bug] Prevent browser page caching of dynamic content
mathias
drupal-devel at drupal.org
Fri Mar 11 04:52:41 UTC 2005
Issue status update for http://drupal.org/node/5900
Project: Drupal
Version: cvs
-Component: user system
+Component: other
Category: bug reports
Priority: critical
Assigned to: mathias
Reported by: christianlong at christianlong.com
Updated by: mathias
Status: patch
Attachment: http://drupal.org/files/issues/control-browser-caching_1.patch (698 bytes)
I figured it out.
Let's assume for a moment that we're using Drupal caching and the cache
table is empty. A page is requested and the browser and Drupal both
cache it. On the next request Drupal pulls the page from its cache and
all of a sudden the browser sees a different set of headers sent for
the same page (such as an etag and gzip header), so another HTTP 200
response is invoked. Finally on the third request for the same page,
everything lines up and a HTTP 304 is emitted, saving valuable
bandwidth. This is true for subsequent requests until the Drupal cache
is cleared.
The caching problem is a result of some browsers getting confused
between the first and second requests for the same page. Sending 304s
for a page that has multiple cached copies confuses the heck out of
some browsers. In the case of Safari, it tries to resolve this by
displaying the most recent copy it has (even if the server told it not
to cache that copy). Other browsers just show their last known
cacheable copy which is still wrong.
The solution is to explicitly tell the browser when to cache a page
that will be 304'd later on. In Drupal talk, this means the browser
shouldn't cache a request that isn't also in the Drupal cache table.
In the above example this would be the first request.
mathias
Previous comments:
------------------------------------------------------------------------
February 16, 2004 - 14:44 : christianlong at christianlong.com
Attachment: http://drupal.org/files/issues/f_040214_01.txt (6.33 KB)
This bug has been discussed before, but not resolved.
http://drupal.org/node/view/1740
http://drupal.org/node/view/5686
One suggestion was to change the browser settings. That means I have
to tell all my users to change their browser cache settings. Not a
great way to run a site.
Instead, the login should just work with default browser settings.
Currently, login does not work, as described below.
I log in to my drupal site, feingold.christianlong.com, and the
logged-in home page appears, with my user name. OK.
The address bar shows
http://feingold.christianlong.com/node?PHPSESSID=867c4e2d57d290a3f59e1385249b7545
In the site menu, the "Home" link links to
http://feingold.christianlong.com/
Here's the problem: when I click on the "Home" link, the browser takes
me to the un-logged-in version of http://feingold.christianlong.com/,
which was cached when I was not logged in.
When I hit refresh, I do get the logged-in version of the page.
So, to restate
Start at home page of site, not logged in
Log in - this works, and brings up a logged-in version of the page
Click "Home" - this is where the problem is. I get the cached version
of the home page (from when I was not logged in)
Refresh browser - now I see the correct, logged-in version of the home
page.
It looks like I am logging in OK, but that Drupal is not telling my
browser that there is a new version of the home page that it needs to
check for
Maybe the original (non-logged-in) version of the page is not marked
for no-cache, and so when I click on the "Home" link, I get the cached
(non-logged-in) version
Browsers: MSIE 6 (happens a lot) and Firebird (happens sometimes).
Also happens with christianlong.com
Attached, find an annotated record of the HTTP header traffic.
Thanks
Christian
------------------------------------------------------------------------
February 16, 2004 - 14:48 : christianlong at christianlong.com
Forum discussion is here
http://drupal.org/node/view/5669
------------------------------------------------------------------------
March 4, 2004 - 11:49 : moshe weitzman
I am seing this same behavior at http://www.nshp.org [1]. To see for
yourself, login using the box on the home page while using IE (Firefox
doesn't have a problem).
user: mwtest8
pass: testpass
Here is the php config of the server [2].
Note that IE is set to "Check for newer pages: automatically" in
Options -> Temporary Internet Files -> Settings
Thoughts on how to resolve this are welcome.
[1] http://www.nshp.org/phpinfo.php
[2] http://www.nshp.org/phpinfo.php
------------------------------------------------------------------------
June 15, 2004 - 20:38 : duztin
I had this problem, fix the date on your pc, mine was a month ahead and
my cookies were expiring too soon.
------------------------------------------------------------------------
July 27, 2004 - 14:18 : killes at www.drop.org
Setting to cvs. Is the fix proposed by duztin a real fix? if not: what
else could we do?
------------------------------------------------------------------------
October 16, 2004 - 15:45 : mathias
Attachment: http://drupal.org/files/issues/browser_caching.patch (644 bytes)
I'm moving this to critical because in some cases this bug causes the
hidden $edit['destination'] value of the login form to be set to
'logout', and to the user it appears they can't login since they are
immediately logged out again. Very frustrating.
The other side effect of this browser caching bug are, as stated above,
the authenticated user will receive stale pages and if you have the
login block enabled it will look as though they're were unexplicably
logged out.
It all depends on your browser and its settings, but to attempt to
reproduce:
1. Login to your site.
2. Next, Click the homepage link. If you are served a stale copy of the
page, you've hit the bug. This seems to happen more with IE and Safari
than Firefox.
A potential patch is to have Drupal issue the following header:
header('Cache-Control: no-store, no-cache, must-revalidate');
It works, but I don't know the implications this has on other Drupal
components such as RSS conditional GETS and gzip page serving.
------------------------------------------------------------------------
October 21, 2004 - 12:30 : mathias
Attachment: http://drupal.org/files/issues/browser_caching_0.patch (593 bytes)
Thanks to Ethereal [3] and LiveHTTPHeaders [4], I was able to trace
this problem to the Cache-Control header being sent by Mac OS X server.
On this OS, mod_expires [5] is enabled by default for Apache which sets
the Cache-Control time to 60 seconds for dynamically rendered pages.
The implications of this action are that once you login, re-visiting
any page on the site will result in stale, locally-cached versions if
you viewed that page within 60 seconds of logging in. Since you weren't
logged in on those pages, the system will have appeared to logged you
out. It will keep doing this unless you wait 60 seconds to login. Thus,
users perceive this as a failure to successfully log in.
This new patch doesn't tweak bootstrap.inc. Instead it uses .htaccess
to test if mod_expires is enabled and resets the caching time to 1
second for dynamically rendered content. The benefit of this approach
is that it doesn't interfere with other types of caching that may be in
place for images, pdf files, etc.
# Overload mod_expires variables.
<IfModule mod_expires.c>
# Reduce the time dynamically generated HTML pages are cache-able.
ExpiresDefault A1
</IfModule>
Down the road it may also be wise to consider sending our own caching
headers to maintain control of our caching environment. I reviewed the
following pieces of software, all of which intentionally disable
caching by the client browser or proxy caches. I just grabbed the ones
I thought were most popular.
Plone (Cache-Control parameter is configurable)
Wordpress (except for RSS feeds)
eZpublish
Mambo
phpBB
And if you're so inclined, here's the relevent code snippets [6] for
each piece of software.
[3] http://www.ethereal.com/
[4] http://livehttpheaders.mozdev.org/
[5] http://httpd.apache.org/docs/mod/mod_expires.html
[6] http://asitis.org/tmp/cms_cache_review.txt
------------------------------------------------------------------------
October 21, 2004 - 12:39 : moshe weitzman
I'v seen this bug on drupal.org, so this impacts more than OSX server
... Looks like a nice clean patch to me.
------------------------------------------------------------------------
October 26, 2004 - 10:21 : jvandyk
+1 from me. This reduces confusion among end users while retaining
caching ability for other mime types.
------------------------------------------------------------------------
October 27, 2004 - 02:05 : wazdog
Does this end up having any effect on the RSS conditional GETS? If not,
then +1 from me.
(The WordPress code does check for RSS, so I'm thinking this may set
them to expire too?)
------------------------------------------------------------------------
October 27, 2004 - 19:58 : jvandyk
Attachment: http://drupal.org/files/issues/htaccess_0.patch (792 bytes)
Regarding the concern about RSS caching: this patch is identical to the
previous one except instead of targetting the default we target the
text/html MIME type.
<IfModule mod_expires.c>
# Reduce the time dynamically generated HTML pages are cache-able.
ExpiresByType text/html A1
</IfModule>
This means that RSS feeds, which are MIME type text/xml, are not
affected.
------------------------------------------------------------------------
February 10, 2005 - 19:47 : moshe weitzman
i think this is a small, worthwhile patch. this bug plagues drupal.org
as well.
------------------------------------------------------------------------
February 11, 2005 - 07:45 : Junyor
@jvandyk: Is it just me or is there some junk before the meat of that
patch?
Better cache header handling would be a welcome addition here[1], but
I'm not sure if the suggested patch will solve the problem efficiently.
Isn't this change saying that all pages must always be redownloaded if
mod_expires is enabled? Won't that cause unnecessary bandwidth and
performance overhead?
[1] I'm constantly seeing stale pages throughout all authenticated
pages with Opera.
------------------------------------------------------------------------
February 11, 2005 - 08:42 : moshe weitzman
"Isn't this change saying that all pages must always be redownloaded if
mod_expires is enabled?"
Correct. And that is desired behavior. A dynamic application like ours
has no choice but to redownload every page (excluding RSS feeds).
------------------------------------------------------------------------
February 19, 2005 - 12:52 : killes at www.drop.org
This patch only affects users of mod_expires and harms nobody else. +1
------------------------------------------------------------------------
February 20, 2005 - 22:01 : Steven
Applied to CVS. Do we still need our own Cache Control mechanism? I'm
included to let the server handle this.
------------------------------------------------------------------------
March 9, 2005 - 00:27 : mathias
Attachment: http://drupal.org/files/issues/control-browser-caching.patch (713 bytes)
Browsers are sometimes pulling files from their cache and serving stale
pages when instead they should be asking the server for a new copy.
This is because Drupal doesn't issue its own Cache-Control headers like
most other CMS's [7]. Instead those details are currently left up to
each server, which users on shared hosting aren't authorized to
configure.
This bug rears it's ugly head when site admins use Drupal's caching
mechanism. If a user clicks on the homepage after logging in, they're
very likely to see a stale unauthenticated view. Or if they log out
and returns to the homepage, it'll appear that they're still logged in.
But perhaps the most confusing of these caching issues is when you
click the login button and are once again presented with the same login
form. These cached views are occurring because the browser still thinks
those pages are valid in it's cache. In otherwards, it's sending
if-Modified-Since headers and receiving a valid HTTP 304 response.
Now this doesn't happen 100% of the time in all cases. It very much
depends on the browser (usually Firefox or Safari but never Konqueror)
and how the server is configured (Mac OS X server can be quite
troublesome for example while a default FreeBSD install of apache
usually sends the proper headers).
The most elegant solution I've come up with is to issue our own
Cache-Control headers after a request has been cached by Drupal's
caching mechanism. The header workflow then becomes:
// User requests a page that isn't in the Drupal cache table
HTTP 200
(server issues it's own caching headers)
// User requests the same page (now in the cache table)
HTTP 200
(Drupal issues it's own caching headers explicitly stating not to cache
the page)
// User requests the same page (now in the cache table)
HTTP 304
(server issues it's own caching headers)
If I'm understanding how things work, this lets the browser use the
cached page only as long as we're sending 304 responses. When the user
logs in, the browser cache is invalidated since a different set of
headers are emitted, causing the stale copies to expire.
The end result is that browsers are still allowed to do caching of
requests (including XML feeds) but only hold on to those copies until
Drupal says otherwise.
[7] http://asitis.org/tmp/cms_cache_review.txt
------------------------------------------------------------------------
March 9, 2005 - 00:37 : chx
+1
------------------------------------------------------------------------
March 9, 2005 - 02:25 : Bèr Kessels
+1 from me. Nice clean patch. Works in FF and konq, cannot test in IE,
which has very aggressive caching, AFAIK.
------------------------------------------------------------------------
March 9, 2005 - 12:43 : mathias
Setting this to active at the moment since Safari users are still
experiencing caching issues, albeit less in frequency.
------------------------------------------------------------------------
March 9, 2005 - 22:12 : mathias
Attachment: http://drupal.org/files/issues/control-browser-caching_0.patch (1.14 KB)
I think this is the elegant catch-all case.
The most problematic page with stale caching is the frontpage. Not
variable_get('site_frontpage'), but when $_GET['q'] is NULL. The
solution that works is to make sure this request is never cached by the
browser (Drupal can still cache it of course).
So in summary. This patch should resolve all browser caching problems
while still gracefully emitting HTTP 304 headers in all cases but the /
request.
------------------------------------------------------------------------
March 10, 2005 - 00:21 : Dries
-1 for the $_GET['q'] addition. While it might be the most problematic
page, it is also the most popular page. The $_GET['q'] scenario merely
hides the fact that the headers/caching are not working like they
should.
After a POST operation (eg. log in), Drupal should never send a cached
page. This is checked for in page_get_cache() and page_get_cache().
Maybe those checks are bogus.
More information about the drupal-devel
mailing list