[drupal-devel] [feature] Centralize node queries somehow

chx drupal-devel at drupal.org
Sat Jan 15 08:39:32 UTC 2005


 Project:      Drupal
 Version:      cvs
 Component:    node system
 Category:     feature requests
 Priority:     normal
 Assigned to:  chx
 Reported by:  chx
 Updated by:   chx
 Status:       patch
 Attachment:   http://drupal.org/files/issues/node_rewrite_sql_9.patch (43.98 KB)

module_implementations had a minor, but performance eating bug, it's
caching was broken.

chx



Previous comments:
------------------------------------------------------------------------

December 23, 2004 - 22:25 : chx

Attachment: http://drupal.org/files/issues/node_builder.patch (5.56 KB)

Although I have posted this to the devel list, as Steve pointed out,
putting patches to sandbox is not the best way to do things. So I open
this thread. I have measured the time for build a node query in this is
about 0.22-0.24ms (on an Athlon 933 MHz machine).

------------------------------------------------------------------------

December 23, 2004 - 23:33 : Dries

I'm not likely to commit this.  It's not conform with Drupal's coding
conventions, but more importantly, a node query builder doesn't solve
any real problems.  It just adds a different way of doing things
without offering a significant advantage.

------------------------------------------------------------------------

December 23, 2004 - 23:51 : chx

Attachment: http://drupal.org/files/issues/node_builder_0.patch (5.67 KB)

At this moment there are direct queries into the node table everywhere.
After node_access_*_sql calls were invented, a lot of queries needed to
be changed, and I think we will find more to be inserted. If someone
does sg. else with the queries, permission, language etc. he needs to
patch 60+ quieries in core alone. This is not a good thing. That's why
I am proposing a central node query builder.

As for code standards, code-style.pl node.module returns 17 errors,
none of them comes from my patch. However, I should admit, there were a
few spaces missing from the database.inc patch. So I resubmit.

------------------------------------------------------------------------

December 23, 2004 - 23:56 : chx

Attachment: http://drupal.org/files/issues/node_builder_1.patch (5.67 KB)

OK, I was wrong, there is a space missing in one of the rewritten
queries. But I doubt this matters too much, as this is only a proposal
and I think there will be a lot of revisions before it gets to the
core. 

------------------------------------------------------------------------

December 24, 2004 - 00:26 : drumm

The blank lines between the comments and function starts will confuse
the documentation parser on drupaldocs.org.

I'm not sure if I support this quite yet. Reducing the number of
queries and whatnot is great, but this patch doesn't do that quite yet.
I'd only want this if the new system is more readable and easy to write.
I'm not sure if this is true.

As for code-style.pl, I don't think this is actually used much and it
doesn't catch everything. Such as, most operators (=, =>, +, etc) like
a space on either side of themselves. I'd support removing
code-style.pl and throwing it in contributions somewhere. (If this
needs further discussion or action, please file another issue, lets
keep this focused on this patch).

I'm going to wait for a few more revisions of this before I weigh in a
+1 or -1.

------------------------------------------------------------------------

December 24, 2004 - 00:58 : Jose A Reyero

I  can really think of a number of advantages of this, and though not
really for committing it yet, think is very good to see some discussion
on the issue and some concrete implementations.

When implementing i18n I run into this problem, of having to patch lots
of queries -actually too many- in about all the modules doing any kind
of node listing. This far, I'm stuck at this point. Then I realized it
was just the same for node permissions, and will be the same again for
whatever new functionality you want to add in the future which affects
node listings.

I think actually that direct access to 'node' table should  be avoided
when  possible out of the node.module. And the only way to achieve this
is to have some kind of query builder. And also think that chx's
approach is very good. it provides some upper layer at the 'node'
level, which can use some object level semantics while relying on a db
layer query builder, which could be used later for other objects
-users, taxonomy terms...-

Just think of all the work that this could have saved when implementing
permissions system, which I think is ugly and dirty anyway because of
all that node_join, node_where...

In the long run, adding such complex logic as sql code, patching and
re-patching queries is bad, very bad, making any new change one higher
level of complexity and driving us away from a OO model, which I think
we should tend to. It's not using PHP objects. It is thinking of nodes
as 'objects' and coding according to it.

Oh, yes, sorry, this is growing too long to write it here :-). But we
could talk also about db portability...



------------------------------------------------------------------------

December 24, 2004 - 01:14 : moshe weitzman

I should add that the Organic Groups module could benefit from this node
listing SQL builder. I have awkwardly worked around the problem for now
but I don't like it.

I don't know if this patch is good or not, but I do lean toward the
functionality that it offers. Remember that any developer is free not
to use the query builder and issue direct SQL as needed. If the query
builder is making life hard, simply don't use, just like today.

------------------------------------------------------------------------

December 24, 2004 - 10:35 : Bèr Kessels

I +1 for this functionality (i have no time to comment on coding style
etc).
It will not only improve maintainability, but will allow a far easier
implementation of the i18n. 
i18n currently has a lot of patches, simpley to add logic to all those
node sql queries referred to. 

------------------------------------------------------------------------

December 24, 2004 - 13:20 : chx

Attachment: http://drupal.org/files/issues/node_builder_2.patch (1.63 KB)

OK, maybe the former approach was too complex. How about this? This
requires a lot less change to the queries and no change to database.inc
and ten times faster.

As usual, a sample query is rewritten and an example of the proposed
hook is provided.

------------------------------------------------------------------------

December 25, 2004 - 00:09 : chx

Final thoughts for this day. If you like the last version, there is a
possibility to automate the whole thing by adding the following three
lines to the beginning of db_prefix_tables:



Of course, there is very small chance of a loop here, so this would
require more thought, but I think at the end this would be a good
thing.

------------------------------------------------------------------------

December 27, 2004 - 12:55 : Dries

I like the second approach better as it keep things readable and gives
me a bit more control over the order in which things are written. 
Looking for more feedback from the others.  Did you try updating more
queries to make sure it works as intended in all cases?

------------------------------------------------------------------------

December 27, 2004 - 19:48 : chx

Attachment: http://drupal.org/files/issues/node_builder_3.patch (1.23 KB)

Yes, I tried with many different queries. And I have found one bug --
forgot the underscore in the regexp following FORM so I have rewritten
the whole regexp. It is now a lot simpler :)

And node_query is now reentrant, so if a function in the process of
hook_sql calls node_query, it won't fall into an infinite loop. This
situation did not occur but it's better to forestall such things.

------------------------------------------------------------------------

December 28, 2004 - 05:48 : moshe weitzman

This patch is a step forward. I thought of another feature which becomes
easier with the proposed hook_sql(). think of filtering like
freshmeat.net. At freshmeat, you can say that you only want to see
projects where OS=Windows and License=GPL (for example).  A Drupal
equivalent is 'only show me nodes related to Democrats and Poverty'.
This sort if global node filtering is very hard in Drupal today. With
this hook, it becomes easy.

------------------------------------------------------------------------

December 28, 2004 - 12:37 : Dries

It's starting to look good (and you're getting more support)!


 Have you read the following threads:
http://lists.drupal.org/archives/drupal-devel/2004-11/msg00198.html and
http://lists.drupal.org/archives/drupal-devel/2004-11/msg00230.html? 
It's an open issue related to your work.  If you start joining tables,
you might have to add DISTINCT()s.  When you are not joining tables, it
would be nice if we'd not pay the cost of a DISTINCT().
 It would be good if you could update the node queries in core to take
advantage of it.  It's not necessary while prototyping though.
 It would be good if the il8n team could take a closer look at this
patch by trying to use it.
 It would be good if you could add some PHPDoc.


------------------------------------------------------------------------

January 4, 2005 - 07:55 : chx

Here is a sum of changes in this new revision:


Following Dries' advice, I read those threads and now DISTINCT(nid) is
used only if there is something joined. Whether the access_grants
system is used or not, is determined at the module configuration
screen, so for efficiency reasons, I have patched system.module.
I the very first version of my patch (you know, with arrays) I passed
on the whole query so hook_sql implementations could easily determine
whether they are interested in this query or not. In a very cool
brainstorming with Jose, we found out, that passing the whole query is
not a good thing, hints are enough. So when I will patch the forum
queries, I will define a NODE_SQL_FORUM hint, and a filter which does
something cool with forums (I have heard something about private forums
not implemented -- I maybe wrong here) may do so.
We now have a separate helper function to call hook_sql which is cool,
'cos with it the last node_access_*_sql calls which were lurking in
node_search could be elinimated.


------------------------------------------------------------------------

January 4, 2005 - 07:56 : chx

Attachment: http://drupal.org/files/issues/central_node_query.patch (6.46 KB)

Funny thing. Forgot to attach the patch.

------------------------------------------------------------------------

January 6, 2005 - 00:46 : chx

Another day, another revision. As this patch now has nothing to do with
queries, but it is about rewriting node SQL statements, I have renamed
it node_rewrite_sql, the helper _node_rewrite_sql and there is
hook_node_rewrite_sql. hook_node_rewrite_sql expected return value has
changed: it can return an array('join'=> ..., 'where'=>...,
'distinct'=> ) so it may determine whether DISTINCT is needed or not.
Neither index is mandatory. That hook_node_rewrite_sql impementations
will decide whether they are asking for DISTINCT or not, is -- I think
-- a good opportunity to optimize for speed. For eg. I doubt every node
access mechanism requires DISTINCT.

One last note: hook return value can be also can be a string with a
WHERE statement for the sake of simplicity.

node_rewrite_sql now works with multiline queries.

------------------------------------------------------------------------

January 6, 2005 - 00:46 : chx

Attachment: http://drupal.org/files/issues/central_node_query_0.patch (7.43 KB)

I can't believe I once again did not attach the patch 

------------------------------------------------------------------------

January 6, 2005 - 17:22 : chx

Attachment: http://drupal.org/files/issues/central_node_query_1.patch (6.65 KB)

Moshe, thanks for your patience and wisdom.


Extract is thrown out, list is used.
node_node_rewrite_sql looks better now.
access_grants_present is determined in run time, once per page.


------------------------------------------------------------------------

January 6, 2005 - 21:35 : Dries

The hint-stuff looks a bit hairy to me.  It would be nice if this could
be avoided but I see what it is necessary.  

The semaphore stuff is vague too.  I don't see how you'd be able to get
into an endless loop.

It might actually be useful to turn node_access_grants_present() into a
more generic module.inc function.  Maybe something along the lines of
module_implements($hook), which would return a list of modules
implementing the specified $hook.

I suggest you update more core modules.  If you do, I'll measure the
performance gain against a copy of drupal.org's database.

------------------------------------------------------------------------

January 6, 2005 - 22:47 : chx

Attachment: http://drupal.org/files/issues/node_rewrite_sql.patch (15.72 KB)

The hint stuff is something is pretty useful: hook implementations can
fine tune themselves when to fire and when not. Good for features and
performance, too.

module_implements is done.

Semaphore -- we will have tons of node_rewrite_sql calls in the core.
Now someone writes a hook_node_rewrite_sql implementation which calls a
function which in turn calls another one which calls node_rewrite_sql .
I would rather not be the one who debugs such a scenario.

Thanks for the offer! forum.module is now patched. Others will follow
tomorrow.

------------------------------------------------------------------------

January 6, 2005 - 23:21 : Jose A Reyero

I think this looks good enough, so I'm starting to test and use this for
i18n.  If this get into the node module, I don't think I will need any
node.module patch anymore...

A pair of performance hints:
- As this node_rewrite_sql is going to be called several times for each
page, we can cache the list of modules implementing the hook in a static
variable.
- We can even cache the whole result for each 'hint' value, so the
whole hook will only be called once or twice for each page.

Some suggested 'hint' values to be defined:

NODE_REWRITE_SQL_LIST  -> This one for simple generic list of nodes for
the main page, could be default.

NODE_REWRITE_SQL_ADMIN_LIST -> List of nodes for administration pages

NODE_REWRITE_SQL_RELATED_LIST -> List of objects related to nodes, but
no nodes themselves -like 
comments-, node table may have to be joined...

NODE_REWRITE_SQL_BLOCK_LIST -> Listing for a block

NODE_REWRITE_SQL_NO_LANGUAGE -> No need to add language conditions.
This could hapen when you need a listing for other languages, or mixed
languages, for translators or for administrators.. 

I would need something like this last one for i18n, just to have the
permissions working and some flexibility with language conditions.

And also the one you already have in place ...._SEARCH.

All this will allow for options like search, admin, list current/all
languages...

------------------------------------------------------------------------

January 7, 2005 - 17:30 : Jose A Reyero

Attachment: http://drupal.org/files/issues/central_node_query_node_module.patch (7.07 KB)

I found some problems when using the patch. The where conditions are not
merged well if there are more than once, and also conditions need some
parenthesis around.

So I replaced the $where and $join strings in _node_rewrite_sql by
arrays which are imploded at the end.

Also added some 'hint' definitions. Patches for i18n module are coming
next.

This works like charm for node listings, comment listings, searches,
etc...

Btw, I also think 'semaphores' are not really needed.

------------------------------------------------------------------------

January 7, 2005 - 19:04 : Dries

I don't see how the hint system can be portable: are contributed modules
supposed to define their own hits?

------------------------------------------------------------------------

January 7, 2005 - 22:54 : chx

The hints -- I was in a hurry yesterday evening, so I have forgot to
define a NODE_REWRITE_SQL_FORUM in the forum.module patch and also
forgot to use the hints. So I think core node types will have their
hint defined.

I have a vague idea about how contrib modules could define their own
hint type: we shall have a NODE_REWRITE_SQL_HINT_MAX, and every hint
define should like this -- beware this is not working! This is just an
idea.



Well, if defines does not work, maybe a global node_rewrite_sql_hints
array should be introduced...?

------------------------------------------------------------------------

January 8, 2005 - 09:54 : chx

Attachment: http://drupal.org/files/issues/node_rewrite_sql_0.patch (16.69 KB)

Here is a list of changes:


Complete rethought of hint system. Uses array of strings instead of 
defines. There are a number of examples.
the new comment.module patch shows a good usage example for the third
parameter of node_rewrite_sql which is at the moment called node_alias.
It's really not a good name, it should be "the alias of the table which
holds the nid field for this query". BTW. there should be an index on
comment.nid.
Incorporated Jose's "replaced the $where and $join strings in
_node_rewrite_sql by arrays which are imploded at the end". I was a
fool to lose these, the first version (which used array) used a proper
implode for WHEREs.



------------------------------------------------------------------------

January 8, 2005 - 11:27 : Dries

I have two fundamental problems with it this patch:

The loop detection mechanism has to go.

The hints are difficult to understand and I'm not convinced they are
good/needed. Does the patch contain a good example of a
node_rewrite_sql function/hook that uses these hints?  I can't find
one.  I won't like these hints until I'm convinced they are needed and
that they represent the best possible solution.  Convince me.

Could you extend the documentation about the hint-system?  Currently it
reads "An array of hint strings about the query, passed on to hook_sql
handlers.".  Clearly, that doesn't help me at all.  The documentation
should answer the following questions: (1) what are the hints for, (2)
when should I use them (or not use them) and (3) what does hints look
like (what is their format/syntax)?  



Note that I can't compare performance unless more queries have been
updated.

Thanks chx.

------------------------------------------------------------------------

January 8, 2005 - 11:30 : Dries

Why does the hints in the comment module include 'related_list'.  This
is confusing.

------------------------------------------------------------------------

January 8, 2005 - 12:52 : Jose A Reyero

Well, it seems we have different views about the 'hints'. 

At this point, the node_rewrite has two main uses: permissions and
language conditions (i18n).  I don't know whether they could have any
use for permission system,  but for i18n, some extra information about
the rewritten queries will be needed:

- Most of the queries will just select fields in the node table. Here,
only a where condition is needed, but language condition may be added
or not depending on the kind of query:

_LIST would be a list for the main page, so only in this case,
decissions can be made depending on the path. For some pages, you may
want a list of nodes for all languages and for some others, only
current language.

_ADMIN_LIST is a list of nodes for administration. An option will be
added in i18n for 'administering all languages together' or 'administer
each language separately', or even a language field could be added in
the search form. So language conditions will be merged diferently
 
_BLOCK_LIST is a list of nodes for blocks. I'm thinking here of a
global option like 'show mixed languages in blocks'. Anyway, this is a
list for a block and  you cannot have any logic depending on path
here.

_SEARCH, would be only for searching, and in this case you may decide
to search 'all languages' or only the current one. This can be a global
option or some extra field in search form. When doing text searches,
searching all languages together will make a lot of sense, but also
restricting search by language.

About _RELATED_LIST, this is the case of comments, is a list of objects
which are not nodes themselves, but related objects in a different
table. Here, the node table itself may have to be joined as it is not
yet in the query.

When dealing with the translation system, the language conditions will
be added or not by the module requesting the query itself, so no
further language conditions will be needed. Queries will have to be run
anyway through 'node_rewrite_sql' to have permissions in place.
That's what the _NO_LANGUAGE hint is about.

So this is my proposed use for the 'hint' thing, though maybe the hints
themselves could be better defined or elaborated, but this is the main
idea. I mean no specific hints for modules but generic ones for 'kind
of listing' you are requesting.

Maybe for the permission system, it could be also some use, as you
could decide you want to show a list of objects to anyone, but deny
access to the objects themselves, something like 'subscribe or register
if you want to read the article'. So permissions here will apply
differently depending on what you are doing.

We can think also about more creative use of permissions like 'you can
read latest articles' but old ones will be available only for selected
people, or the opposite 'only old content available for everybody'. Or
even 'you can read articles but not comments'. This kind of permissions
would need also some 'hints'.






------------------------------------------------------------------------

January 8, 2005 - 13:17 : chx

Attachment: http://drupal.org/files/issues/node_rewrite_sql_1.patch (26.89 KB)

More queries rewritten. Loop detection gone. Will gather my thoughts on
hint system later, but I do hope this amount of queries will be enough
for a performance measurement. Now only blogapi, blog, book calls
node_access_*_sql, all others are done.

------------------------------------------------------------------------

January 8, 2005 - 13:20 : chx

Attachment: http://drupal.org/files/issues/node_rewrite_sql_2.patch (26.88 KB)

Forgot to save taxonomy.module before rolling the patch, so a DISTINCT
remained :(

------------------------------------------------------------------------

January 8, 2005 - 13:38 : chx

I see Jose has written a lot about hints, thanks! "Well, it seems we
have different views about the 'hints'." -- "we" probably means Dries
vs. "Jose and I" 'cos I agree with Jose, but I see some more uses,
namely what I have said before on making a filter on forums. If you
want to that, you need to know which queries are coming from the
forum.module. What shall we do? Pass on the whole query, and let the
filter proccess the query to decide whether now it is the forum block
it hunts for? I doubt. Let's give it a hint it's a forum and a block
query.

Some kind of extensionable hint mechanism is necessary, 'cos what about
a contrib module which filters another contrib node, say flexinode?


------------------------------------------------------------------------

January 9, 2005 - 15:05 : Dries

Could we drop the hint system if we'd choose not to support the il8n
module for now?

------------------------------------------------------------------------

January 9, 2005 - 16:54 : Jose A Reyero

:-(

Ok, let's try to think positively.

Could we at least pass the query along so i18n -or whatever module
tries to implement this hook- could support itself?

------------------------------------------------------------------------

January 9, 2005 - 18:32 : chx

Well, passing along the query would mean no problems, but I wonder
whether a better documented hint system would be better? I had no
problems with letting the loop detection system go it was just a safety
belt for some distant future possibility. But the hint system is -- in
my point -- is needed.

So, here are the valid hint strings:


node, forum, comment etc.name of the module from the query originates,
like forum, comment, flexinode etc.
genericthis would be be for generic listings, like node_default_page,
forum_get_forums etc.
relatednode table is not joined so if you want to use other fields from
node table than nid you shall join node table yourself. However, if you
have your table which is like JOIN {mytable} ON mt.nid=$nid_alias.nid
that's fine.
adminsomething meant only for administrators.
blocknodes in a block
searchsomething to do with a search



------------------------------------------------------------------------

January 9, 2005 - 18:36 : chx

Attachment: http://drupal.org/files/issues/node_rewrite_sql_3.patch (28.31 KB)

Killes said I did not had -F^f in the last version.

------------------------------------------------------------------------

January 9, 2005 - 22:37 : chx

Attachment: http://drupal.org/files/issues/node_rewrite_sql_4.patch (28.77 KB)

Automatic version as discussed and args for Killes.

------------------------------------------------------------------------

January 9, 2005 - 22:55 : chx

Attachment: http://drupal.org/files/issues/node_rewrite_sql_5.patch (28.31 KB)

Hm. It seems we have agreed on no automatization. However, arbitrary
arguments are kept for Killes.

------------------------------------------------------------------------

January 10, 2005 - 21:02 : Dries

The last version of this patch improved the performance of my test
site's main page by 257%.  That is, my machine can serve 2.57 times as
much pages in the same time interval.

Out of curiousity, I investigated this some more and it turns out this
can be tracked down to the useless DISTINCT's which this patch
removes.

Unless someone objects to this patch or the approach taken, I'm going
to commit this shortly. Please review this patch carefully as it marks
an important change.

------------------------------------------------------------------------

January 10, 2005 - 22:39 : chx

Attachment: http://drupal.org/files/issues/node_rewrite_sql_6.patch (44.18 KB)

More DISTINCTs cleared, now there are no more node_access_*_sql calls in
the core modules. This may be a final version.

Or we can find a few more to rewrite. I've searched for {node} selects,
dropped those which had nid = %d as I thought those do not need rewrite
(I may be wrong) and dropped also those that are rewritten, dropped
those that are indexing or cron. Here are the remaining queries, which
may need rewriting:



------------------------------------------------------------------------

January 10, 2005 - 23:02 : Chris Johnson

This patch seems like both a Good Thing and a Bad Thing to me.

The idea of centralizing node queries is a good one, I think.  It seems
to be the original motivation for this patch.  It allows better SQL
optmization and hence performance.  It puts all the code in one place
for better future maintenance and enhancement.

On the other hand, constructing complex SQL statements from
programmatic logic is full of peril.  The task quickly becomes very
large and complex.

Manipulating the SQL language elements via code means it is harder to
see when one has made a syntactic mistake.  Most such errors will be
fatal.  As the SQL construction code becomes more complex with
development, the number of paths through it become huge and impossible
to exhaustively test.  Some day, someone will pass parameters that
cause fatal SQL syntax to be generated in a "production" system.

It also obscures SQL syntax differences between supported databases. 
It's more likely to introduce usage that works with MySQL but fails
with Postgres.

I've long been in favor of unifying and centralizing all of the
database access in Drupal, to the greatest extent which does not
adversely affect performance.  So in this way, I think this patch is a
big step in that general direction.

But I've also been involved in maintaining a search engine which used
this technique to create the SQL statements on the fly.  It became
extremely complex over time as one, then two, then three, etc. new
possibilities were added (e.g. new usage, new search criteria).  The
more complex it became, the more regression errors we had and the
harder to debug it became.  It got to the point where making one small
addition would result in 3 new bugs.  It worked most of the time, and
when it worked, it was fast.  But it failed fatally in a significant
number of ways.

I'm not sure whether to vote +3 or -4 on this one.  It might be good;
it might be terrible.

------------------------------------------------------------------------

January 10, 2005 - 23:19 : chx

Attachment: http://drupal.org/files/issues/node_rewrite_sql_7.patch (44.18 KB)

Chris, I understand your concerns, but this method -- unlike the
original -- tries to keep things as simple as possible. You can only
JOIN tables and add WHERE conditions. We are not creating complex SQL
on the fly, I tried that originally, just read comment #1 for the warm
welcome by Dries: "I'm not likely to commit this". So do not worry,
while I may be a little hotheaded, Dries keeps order with an iron
hand.

And the actual rewrite mechanism is also very simple. 

Ps. Two typos corrected in this version.

------------------------------------------------------------------------

January 13, 2005 - 23:20 : Dries

I'm going to test and benchmark the latest version of this patch over
the weekend.  If all goes well, I'll commit this to HEAD as no concrete
alternatives have been proposed.  (I can't think of a better approach
and believe that this patch does a good job balancing different
values/requirements.)

It would be much appreciated if those who maintain node-level
permission modules could take a look at this patch to evaluate its
usefulness/performance.  If you maintain a module that queries the node
table you might want to look at this as well -- after all, you'll have
to update your module if this hits core.

(Note that the PHPdoc code in the patch is not up-to-date.)

------------------------------------------------------------------------

January 14, 2005 - 13:47 : chx

Attachment: http://drupal.org/files/issues/node_rewrite_sql_8.patch (43.97 KB)

PHPdoc is upgraded, I hope now it is up to date.

-- 
View: http://drupal.org/node/14731
Edit: http://drupal.org/project/comments/add/14731





More information about the drupal-devel mailing list