[development] Why do we avoid auto-incrementing db columns?

Sammy Spets sammys-drupal at synerger.com
Mon Apr 30 06:25:02 UTC 2007


Earnest,

Insertion of 20 million nodes in a short space of time implies 10 
million users are inserting 2 nodes each in a short space of time. It 
also implies many other combinations. Looking at the plausibility that 
this situation become true.... pretty much no chance you'll get that 
many insertions in the same table at the same time except for the 
watchdog table.

Sure, you've found a way to improve performance for an unrealistic 
boundary case and that's fantastic. Thanks for working it out!

How does your technique of randomly allocating IDs avoid duplicate ID 
allocation?

Cheers,

-- 
Sammy Spets
Synerger Pty Ltd
http://synerger.com

On 25-Apr-07 13:59, Earnest Berry III wrote:
> Hi Sammy,
> 
> There's only on thing I disagree on; the performance hit of the mutex lock.
> If you do not have a relatively large number of insertions, than this is not
> a large performance hit; however, if you do, this does become a performance
> hit. I admit, I'm on the extreme end of the spectrum as I was inserting some
> 20 million nodes (which, expanded means 20 million nodes, 20 million
> node_revisions, and 7-14 million other insertions in diff. tables for the
> application). 
> 
> After implementing random-id generation with out locking the table ever at
> all, the performance increase was very noticeable, on the order to 70-100%
> more throughput of node creation.
> 
> I would think removing the bottle neck though would benefit sites that
> probably  have a lot of comments being added and content, as there is no
> contention for the sequences table anymore.
> 
> 
> 
> -----Original Message-----
> From: development-bounces at drupal.org [mailto:development-bounces at drupal.org]
> On Behalf Of Sammy Spets
> Sent: Wednesday, April 25, 2007 12:54 PM
> To: development at drupal.org
> Subject: Re: [development] Why do we avoid auto-incrementing db columns?
> 
> Hi everyone,
> 
> I'm PostgreSQL maintainer for Drupal and since people are up in the air
> about what is going on down in the DB layer re autoincrement and serial
> fields I figured i'd shed some light. Many people have already stated
> this stuff in earlier mails and hopefully getting the info in one big
> pile will make it more concrete.
> 
> PostgreSQL does implement atomic sequencing with the serial and
> bigserial types. MySQL does the same with the auto_increment field
> property. These are used in Drupal to produce a guaranteed unique ID
> for tables using them. Implementing this in PHP will certainly generate
> non-unique IDs unless you lock tables thus introducing a massive
> performance penalty.
> 
> We use db_next_id() in Drupal to provide a db-independent method of
> getting the next unique ID into the application for use just after the
> insert operation on the DB. The performance hit of using a sequence is
> next to nothing considering all that's involved is a mutex lock, read,
> increment and mutex unlock. Making this more efficient is simply not
> possible.
> 
> Sure there are two different use cases for insertions: those where you 
> need the ID afterwards (e.g node) and those you don't (e.g watchdog). 
> If you want to write code without the use of db_next_id() then do an 
> INSERT and omit the id column. I don't see anyone wanting to spend 
> effort patching Drupal core to gain 0.01% efficiency across the 
> application.
> 
> /me turns his torch off
> 
> -- 
> Sammy Spets
> Synerger Pty Ltd
> http://synerger.com
> 
> On 25-Apr-07 10:15, Barry Jaspan wrote:
> > 
> > Could someone either clue me in or point to a prior discussion about
> > why Drupal avoids using auto-incrementing database columns?  If it is
> > database compatibility, which versions of which databases that we
> > support do not have it?
> > 
> > Thanks,
> > 
> > Barry
> > 
> 


More information about the development mailing list