[development] Why do we avoid auto-incrementing db columns?

Wed Apr 25 17:27:40 UTC 2007

Mark,

Good points. However, I think there are 2 types of Drupal users: Enterprise,
and non-enterprise. (read: large/small).

Meaning, not everyone has a company large-enough where they can have the
dedicated DB staff to maintain, create, and update the sp's, triggers, etc.

However, on the flip side, companies that do have this type of setup and
resources balk at Drupal because they can't leverage their DB staff to truly
separate the db from the app-layer.

It's a catch 22 either way you think of it. By and large though, most Drupal
installs do not have the dedicated DB staff to truly separate the layers;
and I do mean dedicated. Maintaining and optimizing triggers, sp's, etc. in
a production environment is a full time job and not really something to be
looked at as a 'hobby'. 

The other flip side to stored procs, is PHP's arch. And what sp's were made
for. Each request is it's own thread. (this is knowledge from SQL Server,
I'll have to read about MySQL's sp implementation to see if it's the same).
But, depending on the DB, the sp is compiled upon first use with each
connection. If the same connection calls the sp again, the cached query plan
is used. However, if the query plan is dynamic (e.g. WHERE clause changes
each time), performance actually goes down greatly. If the WITH RECOMPILE
option is used (meaning re-do the query plan with each execution),
performance only goes down slightly.
More over, sp's are made for 2 things:

1. Reduce network traffic.
2. Place the processing on the database server and not the app.

So, let's look at sp's; they send a small RPC call to the DB, and the db
sends back the results. Where currently we send SQL statements to the DB,
and get back results. The catch is though, that Drupal sends relatively
small statements, and statements that are not really processor intensive,
compared to say a manager's analytical application that sends a 40 line SQL
statement that goes across 3 databases and takes about 15 seconds to
process.
I guess what I'm saying is that I don't think sp's would give Drupal much of
a performance boost. However, this is a null point because I have no
benchmarks to back this up.

Regarding ref. integrity, I probably generalized and mis-spoke; when I said
it, I meant the ID generation, meaning module x can generate an id for table
y and work with it before actual insertion...hmm....which upon thinking
about it should all be handled in node_save function either way. However, I
have many modules that do not create a node, thus creating a node and doing
a node_save wouldn't give me an nid, if say I needed the eid (an arbitrary
id I made up) because the item is not a node/content-type.

All in all, there are tradeoffs either way...its just which one are we
willing to accept?

-----Original Message-----
From: development-bounces at drupal.org [mailto:development-bounces at drupal.org]
On Behalf Of Mark Fredrickson
Sent: Wednesday, April 25, 2007 12:06 PM
To: development at drupal.org
Subject: Re: [development] Why do we avoid auto-incrementing db columns?

Hi Earnest and others,

On 4/25/07, Earnest Berry III <earnest.berry at gmail.com> wrote:
> Hmm...I'm still more of a fan of controlling the IDs application level
> Than using a db generated serial number, especially when dealing with
> referential integrity.

I have the opposite standpoint. The more referential integrity that is
in the app layer, the harder it it is to integrate and share data from
the database. Use case: I have a another (non-PHP) system that needs
to insert nodes. If it can't generate a valid ID for that node, this
process is halted in it's tracks.

I would like to see more data modeling logic moved to the db layer
(stored procedures, triggers, views, etc), and leave PHP to do the
theming and IO on top of that. E.g. CCK would be a great candidate for
going to the db layer.

> However, say an NID, of the txid (from e-commerce)
> gets passed around drupal and used by other modules, sometimes in loops,
and
> sometimes for other modules to ref. the node; this I'd rather have some
> control of.

I do not see why these behaviors indicate that referential integrity
should be handled in the app. If you are saying that within a loop,
one must generate N IDs, each with a database call, there is some
merit to argument that app created IDs would be better. But there are
going to be other tradeoffs.

> Also, the implementations of "next_val" per-database get a little hairy
> also. You also then have the issue of if you are in a transaction, the
> transaction level, etc. that all starts to play in to the db's
> implementation of "next_val" or "last_insert_id".

I'll defer to others' experience on this subject, but I'd be surprised
if the problems were intractable. Considering that Drupal currently
supports all of 2 databases (I'm crossing my fingers for SQLite
support - that'd be cool) unifying the implementations shouldn't be
that hard.

-M