Re: [development] User Status [WAS: User last access]
I couldn't care less about the SQL92 standard (or any other SQL standard). What I care about is: * What we do is space efficient on MySQL * What we do is fast on MySQL * What we do works on other databases SQL92 doesn't answer any of those questions for me. Consequently, I have a few goals if we implement status bits for user accounts: * Not having one byte per bit * Not having O(n) searches for common bit-status user queries, if we have such queries * Ability to run on other databases, without significant regard to performance or space efficiency "Make the common case fast." <http://en.wikipedia.org/wiki/Amdahl%27s_law> ----- "Ivan Sergio Borgonovo" <mail@webthatworks.it> wrote:
On Sat, 3 May 2008 19:12:46 +0000 "David Strauss" <david@fourkitchens.com> wrote:
But they probably support another bit-based format.
The smallest thing you can find in SQL92 I think is boolean. As said most DB support bit packing and some have aggregate functions over boolean fields.
char doesn't look as a good choice because on UTF8 may have some overhead, it is ordered according to locale and in need it misses interesting aggregate functions. I don't know SQL standards and enough implementations to know if you can avoid both problems in most DB. Some DB let you define collation and encoding on a field base... some don't even let you define the encoding/collation on a table base.
The smallest SQL92 int is 2 bytes (smallint) someone may think it is too large but it is very well handled by modern CPU and is not affected by locale nor encoding... if you've more statuses...
-- Ivan Sergio Borgonovo http://www.webthatworks.it
One more -1 vote for bit masks stored in the database on columns that need to be filtered by. +1 on using an int or char(1) for each of the columns would be more effective, since each discrete column can be used in WHEREs, ...etc. The extra bytes overhead is negligible. This is also more portable across databases and does not require database specific functions.
Agreed. I should also note that space efficiency is lower priority to me than other goals. If space doesn't matter to me, I tend to use TINYINT(1) NOT NULL for booleans. I'm sure there's a reasonably portable way to use something like that. ----- "Khalid Baheyeldin" wrote:
One more -1 vote for bit masks stored in the database on columns that need to be filtered by. +1 on using an int or char(1) for each of the columns would be more effective, since each discrete column can be used in WHEREs, ...etc.
The extra bytes overhead is negligible. This is also more portable across databases and does not require database specific functions.
On Sun, 4 May 2008 10:29:23 -0500 (CDT) David Timothy Strauss <david@fourkitchens.com> wrote:
I couldn't care less about the SQL92 standard (or any other SQL standard).
What I care about is: * What we do is space efficient on MySQL * What we do is fast on MySQL * What we do works on other databases
SQL92 doesn't answer any of those questions for me.
Consequently, I have a few goals if we implement status bits for user accounts: * Not having one byte per bit * Not having O(n) searches for common bit-status user queries, if we have such queries * Ability to run on other databases, without significant regard to performance or space efficiency
char(0) seems to work just on sqlite and MySQL. It is advertised as using just 1 bit. I didn't see anything that explain its implementation, advantages or drawbacks. MySQL support boolean and they "waste" the same/less space as a char(1) in C encoding + they don't suffer of ordering problems related to collation/locales. Hopefully MySQL is going to fix this kind of quirks and come up with a decent boolean implementation and you may end up in a not so efficient, hardly portable mess. If the "status" is going to store a boolean... I'd use a boolean. Especially for "Logged in", "Used", "Never Used" status, boolean may offer the additional advantage over char(0) of storing true, false, null. If it is going to store a multivalued column, the smallest things that doesn't suffer from collation/locales/utf8 and works across *several* DB is smallint, but it is 2 bytes large. MySQL, MS SQL support tinyint (1byte), PostgreSQL, Oracle, DB2 don't, just to name some. char(1) could be a reasonable compromise. Problems with ordering shouldn't be a real concern unless you really go hunting them with "strange" choices. It has to be seen if it really offer advantages over smallint in terms of overhead once you're using utf8. Not being a fixed length data type, it is going to incur in some extra overhead, surely it has an easily to quantify space advantage over smallint. MySQL and several other DB support ENUM... but I don't think D5 and D6 need another data type to deal with.
"Make the common case fast." <http://en.wikipedia.org/wiki/Amdahl%27s_law>
It can't be read as you're advertising it. You can't optimize the most used DB and make the overall application run faster on all DB. What about: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." D. Knuth -- Ivan Sergio Borgonovo http://www.webthatworks.it
participants (3)
-
David Timothy Strauss -
Ivan Sergio Borgonovo -
Khalid Baheyeldin