[drupal-devel] [feature] Fast file-based locking

Dries drupal-devel at drupal.org
Sat Apr 30 10:17:45 UTC 2005

Issue status update for http://drupal.org/node/21556

 Project:      Drupal
 Version:      cvs
 Component:    base system
 Category:     feature requests
 Priority:     normal
 Assigned to:  Jeremy at kerneltrap.org
 Reported by:  Jeremy at kerneltrap.org
 Updated by:   Dries
 Status:       patch

Note that we already use database locking for the sequences table:

includes/database.mysql.inc:  db_query('LOCK TABLES {sequences}
includes/database.mysql.inc:  db_query('UNLOCK TABLES');


Previous comments:

April 29, 2005 - 06:05 : Jeremy at kerneltrap.org

Attachment: http://drupal.org/files/issues/lock_6.patch (13.74 KB)

There are several race conditions in Drupal's core code.  When these
races happens, most of the time they simply show up as an error in the
watchdog log.  However, in some instances the errors can happen early
enough in page generation to show up on the page, preventing the page
from being displayed properly.  The busier a site gets, the more likely
this is to happen.

The attached patch implements file-based locking to prevent race
conditions.  It utilizes PHP's flock() mechanism to provide locking. 
This will not work on NFS (and possibly other networking filesystems)
or VFAT filesystems.  Comments are very welcome.

An alternative form of locking for NFS users could be implemented,
using the creation of directories.  However, this increases the
complexity so I am not interested in even looking into such a solution
until a basic solution (such as the attached) is first merged.

I am using lock.inc on a 4.5 installation on KernelTrap.org to prevent
the race in variable_set, and it seems to be working perfectly.


Some suggested discussion points:

1) Currently I use the names lock_rlock(), lock_rwlock(), and
lock_unlock().  An rlock is a shared lock.  A rwlock is an exclusive
lock.  Perhaps lock_shared_lock() and lock_exclusive_lock() would be
more clear?  Or, we could just have a lock_lock() function that
accepted an additional flag for SHARED (READ) and EXCLUSIVE

2) Everything is in lock.inc.  I had originally intended to put the
generic api into lock.inc, and the specific locking implementation
(lock_file_... in this case) in lock.module.  However, locks are
required in bootstrap.inc before modules are loaded.  I did write some
logic to check the system table to see if the lock module was enabled,
and thus to determine if we should use locks, but as this added an
extra db_query per page load I decided it wasn't worth it.  An
alternative idea is to create multiple lock include files.  lock.inc,
lock.file.inc, lock.dir.inc, lock.db.inc, etc.  However, until there
are actually multiple implementations, there is no point in this.

3) Some sites may not want to use locking, so I have added a
configuration option in the system module to allow it to be disabled. 
Locking is disabled by default.  (If you don't have a busy site, why
bother with extra complexity?)  Efficient file-based locking requires
garbage collection, which is called from system_cron().

4) A truly sanitized file name would be limited in length.  Different
filesystems have different length limitations.  What is a sane choice? 
Too short, and locks won't be unique.  Too long, and locks may not work
properly on certain filesystems.  (Or perhaps we can just rely on the
filesystem to auto-truncate?)


The patch:
 - introduces lock.inc
 - adds lock.inc, and file.inc to bootstrap.inc
 - removes file.inc from common.inc
 - adds locks to bootstrap.inc, user.module, and statistics module
 - adds a configuration option and cron logic to the system.module


April 30, 2005 - 12:04 : Dries

Quick review:

Can you summarize why database locking can't be used today?
Don't show developer speak in user output/documentation (eg. form
description of lock settings).  Reword these for clarity.
I don't know why but I have /problems/ with the names lock_rlock() and
lock_rwlock().  I think I prefer to have one function, with an
additional paramater to specify the lock-type.  Maybe lock_obtain() and
lock_release().  I'm also more familiar with the terminology 'shared'
and 'exclusive' lock but that might be my particular background.  Not
_lock_file_sanitize_name() and friends seems like bloat to me.  It has
only one call-site, plus developers can be instructed about what good
names are.  It is not like we are dealing with user input.
If we don't need attributes now, please don't add them.  If we need
additional attributes, the API does not shield the programmer from
internal implementation details.
Are blocking locks a good idea?  What happens when I own a blocking
lock, and my process gets terminated/abrupted?  Will all other
processes come to halt until the GC comes along?


April 30, 2005 - 12:15 : Dries

If database locking can be used efficiently, I'm in favor of
standardizing on database locking.  I'd think databases are optimized
for this; no need to extend database functionality ourselves or to
build our own layer on top of it.

More information about the drupal-devel mailing list