[development] PDO and database limitations
Larry Garfield
larry at garfieldtech.com
Wed Apr 4 16:17:06 UTC 2007
On Wed, 4 Apr 2007 11:45:13 -0400, "Khalid Baheyeldin" <kb at 2bits.com> wrote:
> On 4/4/07, Larry Garfield <larry at garfieldtech.com> wrote:
>>
>> This is an RFC of sorts. :-)
>>
>> 1) It turns out that PDO from PECL running under PHP 5.1.6, at least,
> has
>> issues. Specifically, segfault issues on otherwise perfectly sane
>> queries.
>> This is apparently a known issue with 5.1.6, and one of the reasons 5.2
>> exists. :-) At present, I have no solution for it other than saying
>> "well,
>> the PDO support is only if you're running 5.2, otherwise use the
> existing
>> MySQL/PostgreSQL drivers, deal". Does anyone have a problem with that,
>> and
>> if so, an alternate solution?
>
>
> Which distro?
>
> I use Ubuntu 6.10 and if you install PHP 5, 5.1.6 is the default you get.
> This makes
> it harder to use 5.2 (custom compile, ...etc.)
>
> If you have working code, I can verify whether I have the same problem or
> not on
> my test server.
I also run Ubuntu Edgy (6.10), which is how I ran into this problem. After spending 4 hours trying to figure out why I was segfaulting only sometimes on a session-load query, I checked with the php-general development list and was told that 5.1.6 had a lot of segfault issues. I found a 3rd party repository and installed 5.2.1 and the segfaults all went away.
> 2) Like any wrapper, PDO, while it offers some really nice features with a
>> common API (like C-level prepared statements, which are what I'm mainly
>> after), has some "lowest common denominator" issues. The main one I've
>> run
>> into so far is that there is no reliable equivalent of mysql_num_rows()
>> for
>> SELECT statements, only for data-changing statements[3]. In testing it
>> doesn't look like the MySQL PDO driver returns anything useful for
>> rowCount()
>> on SELECT. That gives us 3 options.
>
>
> In core HEAD, there are 43 occurrences in 25 files. Haven't checked
> contrib.
>
> includes/common.inc:
> includes/database.mysqli.inc:
> includes/database.mysql.inc:
> includes/database.pgsql.inc:
> includes/locale.inc:
> includes/session.inc:
> modules/aggregator/aggregator.module:
> modules/block/block.module:
> modules/blog/blog.module:
> modules/book/book.module:
> modules/comment/comment.module:
> modules/drupal/drupal.module:
> modules/forum/forum.module:
> modules/node/content_types.inc:
> modules/node/node.module:
> modules/path/path.module:
> modules/ping/ping.module:
> modules/statistics/statistics.module:
> modules/system/system.install:
> modules/system/system.module:
> modules/taxonomy/taxonomy.module:
> modules/user/user.module:
>
> A) Stop using db_num_rows() on result sets. (It's a database-specific
>> feature
>> in the first place.) In places where we use it, use a separate count(*)
>> query instead. That's the more database-agnostic method, and is what
> the
>> PHP
>> manual recommends[3]. If we go this route, it will involve removing the
>> db_num_rows() function from the existing mysql and postgres drivers and
>> refactoring core queries accordingly.
>
>
> My concern here is that all db_num_rows() cases become two queries for
> people using MySQL (the vast majority) as well as PostgreSQL (it has
> pg_num_rows()).
Not all, actually. At least one (in sess_write()) really should have been a count query in the first place as it's never iterated. I'm not sure how many cases will simply become 2 queries and how many will be able to stay as one with some logic refactoring. I don't think I'll know that until I start doing so.
How do the MS SQL and Oracle implementations deal with these issues? Another goal of the PDO implementation is to make adding support for more databases easier, since there's a smaller API footprint to deal with.
> If these are index only queries, then it may not be a big deal, but
> anything
> more
> (which we have a lot of), would incur a significant performance penalty.
> It
> also
> affects large sites using InnoDB which is notorious for very slow SELECT
> COUNT(*).
>
> See a benchmark here:
>
> http://2bits.com/articles/mysql-innodb-performance-gains-as-well-as-some-pitfalls.html
Wouldn't InnoDB be just as slow running a full select statement and then doing a num_rows() on it as it would be doing a COUNT() query in the first place? It still has all the WHERE-ing to do. More specifically, MyISAM is a special case where count(*) is faster than it otherwise would be in any other situation. It's not that InnoDB is slow, it's that MyISAM is fast. :-)
Still, though, I suspect that at least some cases will not get hosed if we refactor properly; some don't need to be full normal queries in the first place, and I am not opposed to adding additional indexes where appropriate.
> Not sure what else can be done here. Is there a way to sneek in a pass
> thru
> function
> (mysql_num_rows() in this case) like ODBC for example allows?
I'm not sure what ODBC does. Can you elaborate?
I'm pretty sure one can't call mysql_* functions on a PDOStatement object.
> B) Instead, have the PDO wrapper do a full ->fetchAll() on the result set.
>> That gives an array of array or object records (specified in the
>> fetchAll()),
>> which can then be simply sizeof()ed in the PDO/db_num_rows()
>> implementation.
>> The downside here is that you need to specify in the fetchAll() whether
>> you
>> want objects or arrays. To not horribly break any existing APIs (I'm
>> trying
>> to minimize the footprint for now; we can break everything later), we'd
>> need
>> to therefore fetchAll() as an array in the PDO driver and then cast back
>> to
>> an object in db_fetch_object() (or vice versa). That feels quite nasty
> to
>> me, honestly, and is probably a not-unnoticeable performance hit.
>
>
> I don't think we can do that safely at all. The result set may exhaust
> available
> memory, or tax other resources if it is rather large.
Yeah, I don't like this one either.
> C) Drop the PDO idea. It should come as no surprise that this is my least
>> favorite option. :-(
>
>
> Option D) : File a feature request against PDO to include a numRows()
> method,
> but that is not a quick solution.
That's the same as C for the forseeable future. Such an addition wouldn't be added until at least PHP 5.3, I imagine, which is who knows when. And even at that, it would require a lot of C-level implementation to emulate num_rows on all databases, whether they natively support it or not.
> I am hoping I can get this into Drupal 6, but time will tell if I can get
> it
>> ready in time.
>
>
> Ubuntu Feisty (due later in April) will have PHP 5.2.1.
> http://packages.ubuntu.com/feisty/web/php5
>
> But, what about other distros? If they are on PHP4 or 5.1, then this will
> be
> a show stopper.
I'm not deprecating the existing mysql and pgsql drivers at this point. This is strictly a "if you're using a stable version of PHP 5, then you get this extra, faster option" feature. Eventually I hope to deprecate the legacy database drivers and just use PDO with thin wrappers (for db_query_range() and such) for all database stuff, but that's not something we can even consider until 2008 at the earliest. Several web hosts already offer 5.2, though, as do some distros (Debian Etch, Ubuntu Feisty, etc.), so for those people the PDO option should offer a performance boost on prepared statements.
--Larry Garfield
More information about the development
mailing list