Issue status update for http://drupal.org/node/21556 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Jeremy@kerneltrap.org Reported by: Jeremy@kerneltrap.org Updated by: Dries Status: patch Note that I'm just playing the devil's advocate here (and in the comment above). I want to make sure that the code bloat introduced by file-based locking is justified. As Drupal matures it is attracting bigger projects using Drupal on load-balanced setups (eg. multiple webservers in front of a database server). Should we maintain a separate locking-backend for them, or use one that is compatible with their environment to begin with? I'm OK with using this approach for now, but in the end, I'd rather have us maintain a single locking backend, than two. Also, the current locking API is not compatible with database locking, I'd think. It does not allow switching between database and file-based locking without rewriting code. This matters because the next logical step would be support for transactions to guarantee true database integrity. File-based locking does not offer true database integrity. In a way, it is not future-safe. Dries Previous comments: ------------------------------------------------------------------------ April 29, 2005 - 06:05 : Jeremy@kerneltrap.org Attachment: http://drupal.org/files/issues/lock_6.patch (13.74 KB) There are several race conditions in Drupal's core code. When these races happens, most of the time they simply show up as an error in the watchdog log. However, in some instances the errors can happen early enough in page generation to show up on the page, preventing the page from being displayed properly. The busier a site gets, the more likely this is to happen. The attached patch implements file-based locking to prevent race conditions. It utilizes PHP's flock() mechanism to provide locking. This will not work on NFS (and possibly other networking filesystems) or VFAT filesystems. Comments are very welcome. An alternative form of locking for NFS users could be implemented, using the creation of directories. However, this increases the complexity so I am not interested in even looking into such a solution until a basic solution (such as the attached) is first merged. I am using lock.inc on a 4.5 installation on KernelTrap.org to prevent the race in variable_set, and it seems to be working perfectly. ----- Some suggested discussion points: 1) Currently I use the names lock_rlock(), lock_rwlock(), and lock_unlock(). An rlock is a shared lock. A rwlock is an exclusive lock. Perhaps lock_shared_lock() and lock_exclusive_lock() would be more clear? Or, we could just have a lock_lock() function that accepted an additional flag for SHARED (READ) and EXCLUSIVE (READ/WRITE)? 2) Everything is in lock.inc. I had originally intended to put the generic api into lock.inc, and the specific locking implementation (lock_file_... in this case) in lock.module. However, locks are required in bootstrap.inc before modules are loaded. I did write some logic to check the system table to see if the lock module was enabled, and thus to determine if we should use locks, but as this added an extra db_query per page load I decided it wasn't worth it. An alternative idea is to create multiple lock include files. lock.inc, lock.file.inc, lock.dir.inc, lock.db.inc, etc. However, until there are actually multiple implementations, there is no point in this. 3) Some sites may not want to use locking, so I have added a configuration option in the system module to allow it to be disabled. Locking is disabled by default. (If you don't have a busy site, why bother with extra complexity?) Efficient file-based locking requires garbage collection, which is called from system_cron(). 4) A truly sanitized file name would be limited in length. Different filesystems have different length limitations. What is a sane choice? Too short, and locks won't be unique. Too long, and locks may not work properly on certain filesystems. (Or perhaps we can just rely on the filesystem to auto-truncate?) ----- The patch: - introduces lock.inc - adds lock.inc, and file.inc to bootstrap.inc - removes file.inc from common.inc - adds locks to bootstrap.inc, user.module, and statistics module - adds a configuration option and cron logic to the system.module ------------------------------------------------------------------------ April 30, 2005 - 12:04 : Dries Quick review: Can you summarize why database locking can't be used today? Don't show developer speak in user output/documentation (eg. form description of lock settings). Reword these for clarity. I don't know why but I have /problems/ with the names lock_rlock() and lock_rwlock(). I think I prefer to have one function, with an additional paramater to specify the lock-type. Maybe lock_obtain() and lock_release(). I'm also more familiar with the terminology 'shared' and 'exclusive' lock but that might be my particular background. Not sure. _lock_file_sanitize_name() and friends seems like bloat to me. It has only one call-site, plus developers can be instructed about what good names are. It is not like we are dealing with user input. If we don't need attributes now, please don't add them. If we need additional attributes, the API does not shield the programmer from internal implementation details. Are blocking locks a good idea? What happens when I own a blocking lock, and my process gets terminated/abrupted? Will all other processes come to halt until the GC comes along? ------------------------------------------------------------------------ April 30, 2005 - 12:15 : Dries If database locking can be used efficiently, I'm in favor of standardizing on database locking. I'd think databases are optimized for this; no need to extend database functionality ourselves or to build our own layer on top of it. ------------------------------------------------------------------------ April 30, 2005 - 12:17 : Dries Note that we already use database locking for the sequences table: includes/database.mysql.inc: db_query('LOCK TABLES {sequences} WRITE'); includes/database.mysql.inc: db_query('UNLOCK TABLES'); ------------------------------------------------------------------------ April 30, 2005 - 16:04 : Jeremy@kerneltrap.org Attachment: http://drupal.org/files/issues/lock_7.patch (12.43 KB) ""Can you summarize why database locking can't be used today?" " In one word: performance. Locking an entire table when a race is on one row of the table is obviously a huge bottleneck. File locking alows for a much finer granularity As long as we're supporting MySQL 3.x in database.mysql.inc, we don't have row locking support. Let's say we drop support for 3.x, and now we have it? Then you're still looking at slower performance, and resource issues, as while waiting on locks you're going to get a back-log of active database connections. Finally, to get the granularity that file-based locking provides, you'll need a lock table in the database. (rows: lock, value) This has a problem: you still have a race creating the lock in the table the first time, as you can't lock a non-existent row. The solution: lock the whole table while creating a lock. The affect: a bottleneck. I wrote generic database locking code, but it wasn't useful in early page generation as it relied on ignoring MySQL errors. Unfortunately, MySQL errors occuring too early during page generation don't get redirected to the watchdog. ""Don't show developer speak in user output/documentation (eg. form description of lock settings). Reword these for clarity." " I tried! ;) I've tried again in the latest patch. ""I don't know why but I have problems with the names lock_rlock() and lock_rwlock(). I think I prefer to have one function, with an additional paramater to specify the lock-type. Maybe lock_obtain() and lock_release(). I'm also more familiar with the terminology 'shared' and 'exclusive' lock but that might be my particular background. Not sure." " rlock and rwlock are UNIX terms. lock_obtain and lock_release are our own inventions. However, I agree that it's more clear what's going on, so I'm fine with it. Changed in the attached patch to lock_obtain() and lock_release(). I also added defines for LOCK_SHARED and LOCK_EXCLUSIVE. ""If we don't need attributes now, please don't add them. If we need additional attributes, the API does not shield the programmer from internal implementation details." " I have removed the unused parameter. ""_lock_file_sanitize_name() and friends seems like bloat to me. It has only one call-site, plus developers can be instructed about what good names are. It is not like we are dealing with user input. " I have merged _lock_file_sanitize_name() into _lock_file_get_path(). I am leaving it in however, as some lock names are generated dynamically. If we require lock names to be file-system friendly, this limits what variables and cache entries can be named. _lock_file_get_path() is used by two functions. ""Are blocking locks a good idea? What happens when I own a blocking lock, and my process gets terminated/abrupted? Will all other processes come to halt until the GC comes along?" " Not in my testing. I've read rumors and hints that such a problem could occur, but it hasn't in my testing. (I wrote a program to grab an exclusive blocking lock and then sleep. I ran the same code in another browser window and it hung while trying to get the lock. Then I killed the browser that owned the lock, and the second browser successfully obtained it.) I've been running an early version of this code on KernelTrap.org without problems or complaints. Further testing with other OS's and versions of PHP would be good. It would require a very broken version of PHP for this to fail, but certainly there could be broken versions of PHP out there. If that's the case, we can address it when the time comes. BTW: A blocking lock is required in the statistics module, otherwise we'll fail to count how many times a node is displayed. ""If database locking can be used efficiently, I'm in favor of standardizing on database locking. I'd think databases are optimized for this; no need to extend database functionality ourselves or to build our own layer on top of it." " I find that databases tend to be a bottleneck. Grabbing and releasing a lock needs to happen in microseconds, as it should for file based locking. With a database, the latency tends to be greater. If we had to search a huge list of locks, then a database would be beneficial. But we don't. We always know the path that we want. I require a lock that works under a Slashdotting. And when one page is under a Slashdotting, I want the other pages to still be displayed quickly. File-locking allows this quite nicely. ""Note that we already use database locking for the sequences table:" " Table locking is horribly innefficient for the types of locking provided in this patch. ie, when adding/updating a variable, or when adding/updating a cache entry. With table locking, only one process could do this at a time, even if they were working on different variables or cache entries. With file-based locking, the granularity is finer allowing each individual variable and cache entry to have its own lock. This is much more powerful, and much more scalable. Sadly, I'm not currently setup to do any benchmarking. ------------------------------------------------------------------------ April 30, 2005 - 17:19 : moshe weitzman that LOCK TABLE sequences statement causes lots of support headaches because shared web hosts almost never give the permission required to obtain this lock (applies to MySQL4 only). On several client sites I have commented out this line entirely ... File based locks avoid this permission issue nicely. ------------------------------------------------------------------------ April 30, 2005 - 18:51 : Dries I can't imagine why file-based locking would be faster than SQL-based locking. If it would be, MySQL could re-implement their locking mechanism using files (if they aren't using files already)? I understand that finer granularity could improve performance, but according to the MySQL documentation table-based locking is usually much faster than row-based locking (except in rare cases). Details at http://dev.mysql.com/doc/mysql/en/table-locking.html.