[documentation] Drupal guide to caching

Steve Dondley sdondley at gmail.com
Mon Mar 31 03:31:58 UTC 2008


Here's a revised version with feedback given to date and some of my
own improvements. I left out the last 4 kinds of caching because I
have not chnaged those yet.


This guide is an introduction to the various caching mechanims Drupal
uses to speed the site. It's primarily designed for developers but a
lot of it should be understandable by Drupal site administrators who
are familiar with the fundamentals of how Drupal delivers content.

The Basics
Caches are used to improve the performance of your Drupal site by
taking a snapshot of frequently accessed, relatively static and/or
expensive-to-process data and copying it into a location and format so
that it's much faster to retrieve the next time it's needed.
Periodically, cached data must be deleted and updated with the most
recent version of the data. Otherwise, the cache risks going "stale"
and the Drupal website will output old data even though newer data
exists.

The problem of stale data can be particularly troublesome for new
developers who can get confused as to why changes they make to
Drupal's code don't seem to have any affect. For example, a developer
might make a change to the $items array in the hook_menu function. But
because the $items array is cached, the developer will wonder be left
scratching their head as to why their change isn't reflected in the
website's output on the next page load. As most experienced
programmers can attest, this kind of confusion leads to wild goose
chases hunting for non-existant problems. Hopefully, by reading this
document, you'll have a more pleasant Drupal development experience.

What gets cached, where it gets cached, and how
There are two different kinds of caching that take place in Drupal:
file-based and database-based caching.

File-based caching
The best example of file-based caching is the consolidation of all the
stylesheet and javascript files your site delivers into just a few
files. This is particularly useful for Drupal sites where, depending
on which modules are enabled, it's not unusual to have a dozen or more
files to deliver the javascript and style sheet data associated with
each html page. Having so many files to download increases page load
times because the browser has to make several round trips to the
server to download them all. By reducing the number of files to
download, the file caching feature will cut page loading time
significantly.

The style sheet and javascript cache files are stored in the in the
respective "css" and "js" directories within the file system directory
("files" by default) as set on the file system settings page. Style
sheet files are compressed by removing whitespace and linebreaks to
further increase download speed. Javascript files are not compressed
because this can introduce bugs into the code.

How to turn on stylesheet and javascript file caching
======================================================
First, be sure you have your "Download method" is set to "Public" (set
at admin/settings/file-system). You will not be able to use stylesheet
and javascript caching when your "Download method" is set to
"Private." This is because with the private download method, Drupal
will deny access to the stylesheet and javascript files in your files
directory. *I am not sure about the accuracy of the last sentence*

Next, in the navigation menu, select "Administer -> Site Configuration
-> Performance" (admin/settings/performance), scroll down to the
"Bandwidth optimizations" fieldset, check off "Enabled" for the kinds
of files you wish to cache and click "Save configuration".

Note that turning on stylesheet and javascript caching can interfere
with theme development and should only be enabled in a production
environment. If you need to make a change to a cached stylesheet or
javascript file, disable the file caching feature and reenable it
after you make your change. Alternatively, you can delete the cached
files and they will be regenerated for you on the next page load.

Finally, be aware that stylesheet and javascript caching can cause
problems on sites that are run in a load-balance environment (across
two or more servers). This is because the cached files may be stored
on one server and not the other.

Database-based caching
Drupal can also store cached data in special tables in your database.
Drupal core sets up seven tables for this purpose. Although these
tables share the same table structure and it would be possible to
combine them into a single table, they are split up because smaller
tables improve cache peformance. If needed, other modules can add
additional cache tables to the database. The seven tables set up by
core are:

cache
An "all purpose" table that can be shared by modules for storing a few
rows of cached data. Drupal core uses this table to store the
following data:

* Variable data. These are variables that are set with the
variable_set() function and retrieved with the variable_get()
function. This cache saves Drupal from making invidual queries for
each variable_get() call. When this cache goes stale, it is refereshed
by data from the variable table. A call to the variable_set function
will trigger a cache refresh.

* Theme registry data. This registry is a listing of all the themes
that can be overridden by theme developers. The existance of the theme
registry makes it easy to overide themes by allowing a user to simply
drop a tpl.php file in the theme's directory.  When this cache goes
stale, it's regenerated by calling hook_theme() and doing a file
system scan for template files to find additional theme functions.

* Schema data. This data contains information about the structure of
tables in the database. Caching the schema data saves Drupal from
having to reload infrequently changing database schema information
from .install files everytime it's needed. When this cache goes stale,
it's regenerated by calling hook_schema() which collects schema data
from each module's .install file. A visit to the admin/build/modules
page, enabling or disabling a module, or updating module through the
update script will trigger a cache refresh.

cache_block
A table for storing content generated by your blocks. This saves
Drupal from having to repeatedly query the database for unchanged data
related to blocks. When this cache goes stale, it is refreshed by data
from the boxes table where block content is stored and the blocks
table which stores block configuration data. Any update to a block's
content or configuration will trigger a cache refresh for that block.
The entire cache is refreshed when a node, comment, user, or taxonomy
term is added or updated. Module developers have the option of gaining
more control over when a particular block's cache is refreshed using
cache granularity settings for their blocks. Refer to the constants
defined at the top of the block.module for further details.

The block cache can be turned on an off at "Administer -> Site
Configuration -> Performance" (admin/settings/performance).

Note that block caching is inactive when modules defining content
access restrictions are enabled. For example, if organic groups,
content access, taxonomy access modules or other modules that restrict
access to certain kinds of content are turned on, block caching will
be turned off.

cache_filter
A table for storing filtered pieces of content. This saves Drupal from
having to run the same expensive regular expression operations on
unchanged content that gets run through the input filters. When this
cache goes stale, it is refreshed by the content data (e.g. content in
the node_revisions table, blocks table, etc.). Cron jobs, updates to
nodes, and updates to filter formats will trigger cache refreshes.

Question: What is the thinking having the cache_filter refereshed on cron jobs?

On Sat, Mar 29, 2008 at 9:33 PM, Steve Dondley <sdondley at gmail.com> wrote:
> I'm writing a drupal guide to caching. If you know something about
>  caching and are so inclined, please add your two cents about what I
>  have so far:
>
>  THE DRUPAL 6 GUIDE TO CACHING
>
>  This guide is an introduction to the various caching mechanims Drupal
>  uses to speed the site. It's primarily designed for developers but a
>  lot of it should be understandable by Drupal site administrators who
>  are familiar with the basics of how Drupal delivers content.
>
>  The Basics
>  Caches are used to improve the performance of your Drupal site. Rather
>  than extracting the same data over and over again every time a page is
>  loaded, caching stores frequently accessed and relatively static data
>  in a convenient place and format.
>
>  Caching has a drawback in that it can lead to "stale" data. This means
>  that the website outputs old data or content from the cache even
>  though newer stuff exists somewhere else. This problem can be
>  particularly troublesome for developers who can get confused as to why
>  changes they expect to see happen aren't. Hopefully, by reading this
>  document, you'll have a more pleasant and less confusing Drupal
>  experience.
>
>  What gets cached, where it gets cached, and how
>  There are two different ways Drupal stores cached data:
>
>  1) Using files
>  Drupal can consolidate all the css files your site delivers on each
>  page load and place them into a fewer number of files. The resulting
>  files are also compressed. This is important for Drupal sites where
>  it's not unusual to have a dozen or more stylesheets associated with
>  each page, depending on how many modules are enabled. Having so many
>  stylesheets will increase page load times because the browser has to
>  make several round trips to the server to download all the stylesheet
>  files. By using the css caching feature, you can consolidate these
>  files into fewer larger files and decrease page load time
>  significantly.
>
>  Just like style sheets, javascript files can also be consolidated.
>  However, these files are not compressed.
>
>  How to turn on stylesheet and javascript file caching
>  ======================================================
>  First, be sure you have your "Download method" is set to "Public" (set
>  at admin/settings/file-system). You will not be able to use stylesheet
>  and javascript caching when your "Download method" is set to
>  "Private."
>
>  Next, in the navigation menu, select "Administer -> Site Configuration
>  -> Performance" (admin/settings/performance) and scroll down to the
>  "Bandwidth optimizations" fieldset and check off "Enabled" for both
>  CSS files and Javascript files.
>
>  Note that turning on stylesheet and javascript caching can interfere
>  with theme development and should only be enabled in a production
>  environment.
>
>  2) In your database
>  The main location where Drupal stores cached data is in special tables
>  in your database. Drupal core sets up seven tables for caching data.
>  Other modules will add additional cache tables to the database as
>  needed. The seven tables set up by core are:
>
>  cache
>  An "all purpose" table that can be shared by various modules. This
>  table is designed for modules that need to store only a few rows of
>  data. Drupal core uses this table to store the following data:
>
>  * Variable data. These are variables that are set with the
>  variable_set() function and retrieved with the variable_get()
>  function. When this cache goes stale, it is refereshed by data from
>  the variable table. A call to the variable_set function will trigger a
>  cache refresh.
>
>  * Theme registry data. This registry is a listing of all the themes
>  that can be overridden by theme developers. The existance of the theme
>  registry makes it easy to overide themes by allowing a user to simply
>  drop a tpl.php file in the theme's directory.  When this cache goes
>  stale, it's regenerated from the function definitions contained in
>  module and theme files.
>
>  * Schema data. This data contains information about the table
>  structure of the database. When this cache goes stale, it's
>  regenerated from ???. A visit to the admin/build/modules page will
>  trigger a cache refresh.
>
>  cache_block
>  A table for storing content generated by your blocks. This saves
>  Drupal from having to repeatedly query the database for unchanged
>  block content. When this cache goes stale, it is refreshed by data
>  from the boxes table where block content is stored. Any update to a
>  block's content will trigger a cache refresh for that block. The
>  entire cache is refreshed when a node, comment, user, or taxonomy term
>  is added or updated. Module developers have the option of gaining more
>  control over when a particular block's cache is refreshed using cache
>  granularity settings for their blocks. Refer to the constants defined
>  at the top of the block.module for further details.
>
>  The block cache can be turned on an off at "Administer -> Site
>  Configuration -> Performance" (admin/settings/performance).
>
>  Note that block caching is inactive when modules defining content
>  access restrictions are enabled. For example, if organic groups,
>  content access, taxonomy access modules or other modules that restrict
>  access to certain kinds of content are turned on, block caching will
>  be turned off.
>
>  cache_filter
>  A table for storing filtered pieces of content. This saves Drupal from
>  having to run the same expensive regular expression operations on
>  unchanged content that gets run through the input filters. When this
>  cache goes stale, it is refreshed by the data in the node_revisions
>  table which contains node content. Cron jobs, updates to nodes, and
>  updates to filter formats will trigger cache refreshes.
>
>  Question: What is the thinking having the cache_filter refereshed on cron jobs?
>
>  AUTHOR'S NOTE: The items below have to be researched more to determine
>  what trigges them to refresh and are unfinished.
>
>  cache_form
>  A table for storing forms generated by the forms api. This saves
>  Drupal from having to rebuild a unchanged forms. When this cache goes
>  stale, it is refreshed by output from the form module.
>
>  cache_menu
>  A table for storing the menu items and menu item hierarchies. This
>  saves Drupal from having to regenerate the data structures needed to
>  define the menu items and their hiearchies each page load. When the
>  data goes stale, the cache is refreshed from the data contained in the
>  menu table.
>
>  cache_page
>  A table for storing pages for anonymous users. This saves Drupal from
>  making dozens or even hundreds of expensive queries needed to generate
>  a page. When the cache for a particular page goes stale, it gets
>  refreshed by the html output for that page. This cache can be turned
>  on an off under "Administer -> Site Configuration -> Performance"
>  (admin/settings/performance).
>
>  cache_update
>  A table used to store information about installed modules and themes.
>  This saves Drupal from having to perform two very expensive operations
>  for listing the installed modules and themes and the status of these
>  modules and releases compared to what's available for download on
>  drupal.org. When this cache goes stale, it is refresed from the data
>  in the system table.
>
>  --
>  Prometheus Labor Communications, Inc.
>  http://prometheuslabor.com
>  413-572-1300
>
>  Communicate or Die: American Labor Unions and the Internet
>  http://communicateordie.com
>



-- 
Prometheus Labor Communications, Inc.
http://prometheuslabor.com
413-572-1300

Communicate or Die: American Labor Unions and the Internet
http://communicateordie.com


More information about the documentation mailing list