On Sun, 4 May 2008 10:29:23 -0500 (CDT) David Timothy Strauss <david@fourkitchens.com> wrote:
I couldn't care less about the SQL92 standard (or any other SQL standard).
What I care about is: * What we do is space efficient on MySQL * What we do is fast on MySQL * What we do works on other databases
SQL92 doesn't answer any of those questions for me.
Consequently, I have a few goals if we implement status bits for user accounts: * Not having one byte per bit * Not having O(n) searches for common bit-status user queries, if we have such queries * Ability to run on other databases, without significant regard to performance or space efficiency
char(0) seems to work just on sqlite and MySQL. It is advertised as using just 1 bit. I didn't see anything that explain its implementation, advantages or drawbacks. MySQL support boolean and they "waste" the same/less space as a char(1) in C encoding + they don't suffer of ordering problems related to collation/locales. Hopefully MySQL is going to fix this kind of quirks and come up with a decent boolean implementation and you may end up in a not so efficient, hardly portable mess. If the "status" is going to store a boolean... I'd use a boolean. Especially for "Logged in", "Used", "Never Used" status, boolean may offer the additional advantage over char(0) of storing true, false, null. If it is going to store a multivalued column, the smallest things that doesn't suffer from collation/locales/utf8 and works across *several* DB is smallint, but it is 2 bytes large. MySQL, MS SQL support tinyint (1byte), PostgreSQL, Oracle, DB2 don't, just to name some. char(1) could be a reasonable compromise. Problems with ordering shouldn't be a real concern unless you really go hunting them with "strange" choices. It has to be seen if it really offer advantages over smallint in terms of overhead once you're using utf8. Not being a fixed length data type, it is going to incur in some extra overhead, surely it has an easily to quantify space advantage over smallint. MySQL and several other DB support ENUM... but I don't think D5 and D6 need another data type to deal with.
"Make the common case fast." <http://en.wikipedia.org/wiki/Amdahl%27s_law>
It can't be read as you're advertising it. You can't optimize the most used DB and make the overall application run faster on all DB. What about: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." D. Knuth -- Ivan Sergio Borgonovo http://www.webthatworks.it