[documentation] Drupal guide to caching

Larry Garfield larry at garfieldtech.com
Sun Mar 30 03:14:28 UTC 2008


On Saturday 29 March 2008, Steve Dondley wrote:
> I'm writing a drupal guide to caching. If you know something about
> caching and are so inclined, please add your two cents about what I
> have so far:

Awesome!

> THE DRUPAL 6 GUIDE TO CACHING
>
> This guide is an introduction to the various caching mechanims Drupal
> uses to speed the site. It's primarily designed for developers but a
> lot of it should be understandable by Drupal site administrators who
> are familiar with the basics of how Drupal delivers content.
>
> The Basics
> Caches are used to improve the performance of your Drupal site. Rather
> than extracting the same data over and over again every time a page is
> loaded, caching stores frequently accessed and relatively static data
> in a convenient place and format.

It's not just frequently accessed data but expensive-to-process data.  That's 
why block caching was added in Drupal 6, for instance.

> Caching has a drawback in that it can lead to "stale" data. This means
> that the website outputs old data or content from the cache even
> though newer stuff exists somewhere else. This problem can be
> particularly troublesome for developers who can get confused as to why
> changes they expect to see happen aren't. Hopefully, by reading this
> document, you'll have a more pleasant and less confusing Drupal
> experience.

"Stuff" doesn't sound right in context here.  

> What gets cached, where it gets cached, and how
> There are two different ways Drupal stores cached data:

List the titles of both here to provide a roadmap to the text that follows, 
especially as the text that follows is long.

> 1) Using files
> Drupal can consolidate all the css files your site delivers on each
> page load and place them into a fewer number of files. The resulting
> files are also compressed. This is important for Drupal sites where
> it's not unusual to have a dozen or more stylesheets associated with
> each page, depending on how many modules are enabled. Having so many
> stylesheets will increase page load times because the browser has to
> make several round trips to the server to download all the stylesheet
> files. By using the css caching feature, you can consolidate these
> files into fewer larger files and decrease page load time
> significantly.
>
> Just like style sheets, javascript files can also be consolidated.
> However, these files are not compressed.

You may want to briefly mention here why they're not compressed.  Eg, "are not 
compressed as it is much more difficult to do without inadvertently 
introducing bugs."

> How to turn on stylesheet and javascript file caching
> ======================================================
> First, be sure you have your "Download method" is set to "Public" (set
> at admin/settings/file-system). You will not be able to use stylesheet
> and javascript caching when your "Download method" is set to
> "Private."
>
> Next, in the navigation menu, select "Administer -> Site Configuration
> -> Performance" (admin/settings/performance) and scroll down to the
> "Bandwidth optimizations" fieldset and check off "Enabled" for both
> CSS files and Javascript files.

And click "Save configuration"?

> Note that turning on stylesheet and javascript caching can interfere
> with theme development and should only be enabled in a production
> environment.
>
> 2) In your database
> The main location where Drupal stores cached data is in special tables
> in your database. Drupal core sets up seven tables for caching data.
> Other modules will add additional cache tables to the database as
> needed. The seven tables set up by core are:

Other modules "may add", not "will add".  Most contrib modules do not add 
additional cache tables.  It's probably also a good idea to mention why they 
are split out into separate tables, given that their schema is identical.  
(Hint: It's for performance, so certain big caches like the page or filter 
cache can be searched faster with a smaller data set.)

> cache
> An "all purpose" table that can be shared by various modules. This
> table is designed for modules that need to store only a few rows of
> data. Drupal core uses this table to store the following data:
>
> * Variable data. These are variables that are set with the
> variable_set() function and retrieved with the variable_get()
> function. When this cache goes stale, it is refereshed by data from
> the variable table. A call to the variable_set function will trigger a
> cache refresh.
>
> * Theme registry data. This registry is a listing of all the themes
> that can be overridden by theme developers. The existance of the theme
> registry makes it easy to overide themes by allowing a user to simply
> drop a tpl.php file in the theme's directory.  When this cache goes
> stale, it's regenerated from the function definitions contained in
> module and theme files.

Actually it's generated first from hook_themes(), I believe, then from a file 
system scan for template files.  That should probably be clarified for the 
sake of developers.

> * Schema data. This data contains information about the table
> structure of the database. When this cache goes stale, it's
> regenerated from ???. A visit to the admin/build/modules page will
> trigger a cache refresh.

It's regenerated from hook_schema(), I believe.

> cache_block
> A table for storing content generated by your blocks. This saves
> Drupal from having to repeatedly query the database for unchanged
> block content. When this cache goes stale, it is refreshed by data
> from the boxes table where block content is stored. Any update to a
> block's content will trigger a cache refresh for that block. The
> entire cache is refreshed when a node, comment, user, or taxonomy term
> is added or updated. Module developers have the option of gaining more
> control over when a particular block's cache is refreshed using cache
> granularity settings for their blocks. Refer to the constants defined
> at the top of the block.module for further details.

boxes is only for admin-created blocks, isn't it?  The blocks table stores 
info on what blocks show up where and in what theme.

> The block cache can be turned on an off at "Administer -> Site
> Configuration -> Performance" (admin/settings/performance).
>
> Note that block caching is inactive when modules defining content
> access restrictions are enabled. For example, if organic groups,
> content access, taxonomy access modules or other modules that restrict
> access to certain kinds of content are turned on, block caching will
> be turned off.

The reason behind that should be explained.

> cache_filter
> A table for storing filtered pieces of content. This saves Drupal from
> having to run the same expensive regular expression operations on
> unchanged content that gets run through the input filters. When this
> cache goes stale, it is refreshed by the data in the node_revisions
> table which contains node content. Cron jobs, updates to nodes, and
> updates to filter formats will trigger cache refreshes.

It's not always in node_revisions, as you can have CCK fields that are 
filtered text.  I've also run filters over text pulled in from a 3rd party 
data feed.  I'm not sure how pick you want to get here. :-)

> Question: What is the thinking having the cache_filter refereshed on cron
> jobs?
>
> AUTHOR'S NOTE: The items below have to be researched more to determine
> what trigges them to refresh and are unfinished.
>
> cache_form
> A table for storing forms generated by the forms api. This saves
> Drupal from having to rebuild a unchanged forms. When this cache goes
> stale, it is refreshed by output from the form module.
>
> cache_menu
> A table for storing the menu items and menu item hierarchies. This
> saves Drupal from having to regenerate the data structures needed to
> define the menu items and their hiearchies each page load. When the
> data goes stale, the cache is refreshed from the data contained in the
> menu table.
>
> cache_page
> A table for storing pages for anonymous users. This saves Drupal from
> making dozens or even hundreds of expensive queries needed to generate
> a page. When the cache for a particular page goes stale, it gets
> refreshed by the html output for that page. This cache can be turned
> on an off under "Administer -> Site Configuration -> Performance"
> (admin/settings/performance).
>
> cache_update
> A table used to store information about installed modules and themes.
> This saves Drupal from having to perform two very expensive operations
> for listing the installed modules and themes and the status of these
> modules and releases compared to what's available for download on
> drupal.org. When this cache goes stale, it is refresed from the data
> in the system table.

I can't speak with authority on most of these, save for cache_page which I am 
fairly certain is accurate as described.  

Thanks!

-- 
Larry Garfield			AIM: LOLG42
larry at garfieldtech.com		ICQ: 6817012

"If nature has made any one thing less susceptible than all others of 
exclusive property, it is the action of the thinking power called an idea, 
which an individual may exclusively possess as long as he keeps it to 
himself; but the moment it is divulged, it forces itself into the possession 
of every one, and the receiver cannot dispossess himself of it."  -- Thomas 
Jefferson


More information about the documentation mailing list