[development] Why do we avoid auto-incrementing db columns?

Thu May 3 07:49:51 UTC 2007

Most of that makes sense and kudos for putting in the time. I can't see
how sequences can get out of sync. Is this in MySQL only? /me goes into 
the stupid basket.

-- 
Sammy Spets
Synerger Pty Ltd
http://synerger.com

On 30-Apr-07 09:54, Earnest Berry III wrote:
> Sammy,
> 
> A user insertion rate is probably unrealistic, how ever there is more to the
> issue. Sequences are a headache when they get "out" of sync (sequence). By
> randomizing them, this becomes a non-issue.
> 
> On the case of the situation being impossible, take the case of about 30,000
> logged in users be logged in at the same time. Let's say about %50 of them
> are replying to comments, thus why they're logged in, and the other %50 are
> just browsing. At any given time, this ration could flip on the imaginary
> %50 line; regardless, you have 15,000 inserting information into the system.
> Those 15,000 entries create a new comment (thus 15,000 new ids). Now, add to
> that a CCK type, you now have about 3 ids per entry, thus 75,000 new ids.
> Now add to that your own custom module that's listening to nodeapi for an
> insertion of a certain type, and let say about %50 of those insertions are
> caught by your module, thus you now have 75,000 * 1.5, or 112,500 new ids. 
> 
> This can quickly turn into a geometric problem. That said, I made most of
> those stats up as I don't have time to go through some logs and take actual
> metrics, however, the theory should be sound enough that ID generation can
> be resource consuming if not handled well. Look at
> http://buytaert.net/drupal-site-statistics, it's a geometric growth
> function. And that's just the topical "node"/"content" entity, not all the
> functions, ids, and quires underneath the creation of each one.
> 
> As far as id collision, I kind of do a re-hash, but not really. The way I
> handle that is there is a "buffer" stack. I took the "impossibility" that 2
> numbers could be generated in the same second. Thus, in that second, T1
> (thread 1) could be still working with ID 5, and thread 2 would be issues ID
> 5 because the function wouldn't see ID 5 as taken because T1 hasn't
> finished/committed yet. Thus a "buffer" stack is used to hold ID's that
> haven't been used. This is flushed every so often. It's in the write up.
> This setup should bring the possibility of a number collision down to nearly
> 0.
> 
> My problem also came about because I am feeing in large amounts of
> information from other sources on the back-end/underneath. So it's not so
> much the users I'm worried about as much the sheer escalation of the
> content. Some of the data is dropped, some added, some updated, the basic
> principle though is that the information is always changing, adding, and
> subtracting, thus I needed to build a better system to handle this.
> 
> -----Original Message-----
> From: development-bounces at drupal.org [mailto:development-bounces at drupal.org]
> On Behalf Of Sammy Spets
> Sent: Monday, April 30, 2007 3:22 AM
> To: development at drupal.org
> Subject: Re: [development] Why do we avoid auto-incrementing db columns?
> 
> "pretty much no chance" === extremely rare yet not impossible
> 
> -- 
> Sammy Spets
> Synerger Pty Ltd
> http://synerger.com
> 
> On 29-Apr-07 23:55, Boris Mann wrote:
> > On 4/29/07, Sammy Spets <sammys-drupal at synerger.com> wrote:
> > >Earnest,
> > >
> > >Insertion of 20 million nodes in a short space of time implies 10
> > >million users are inserting 2 nodes each in a short space of time. It
> > >also implies many other combinations. Looking at the plausibility that
> > >this situation become true.... pretty much no chance you'll get that
> > >many insertions in the same table at the same time except for the
> > >watchdog table.
> > 
> > Uh. Except for high performance external sources of data feeding a
> > Drupal front end. Remember what assumptions make out of you and me...
> > 
> > Earnest figured it out because he needed to implement it for a project.
> > 
> > >How does your technique of randomly allocating IDs avoid duplicate ID
> > >allocation?
> > 
> > I believe it checks for dupes and re-randomizes, reading his write up.
> > 
> > -- 
> > Boris Mann
> > Office 604-682-2889
> > Skype borismann
> > http://www.bryght.com