Getting Around The Limitations of hook_db_rewrite_sql
I'm writing a module for 4.7 that needs to control access to nodes that are only viewable by users who are approved via an external module. To determine if a node is viewable, I need to check the user *and* the node id. This would work, except for a design decision that was made "for performance". Here's what it says in the docs, in the article about "Node access rights" (http://api.drupal.org/api/HEAD/group/node_access): In node listings, the process above is followed except that hook_access <http://api.drupal.org/api/HEAD/function/hook_access>() is not called on each node for performance reasons and for proper functioning of the pager system. When adding a node listing to your module, be sure to use db_rewrite_sql <http://api.drupal.org/api/HEAD/function/db_rewrite_sql>() to add the appropriate clauses to your query for access checks. In other words, even if you set up your hook_access to prohibit viewing of your content, Drupal 4.7 *will display your private content to an anonymous user*. Once your private node gets added to the list, there are no further checks to your hook access to determine if your node is safe to display. I don't see any way that hook_db_rewrite_sql can be used for this purpose, since there is no simple relationship between the current user and whether a node should be viewable, short of doing one of the following things: * I could use a IN () clause to list every node id of the given type that the given user is allowed to see. This may work in my current situation, but there can easily be *thousands* of these in some applications. So this is not a general solution. * I could manipulate the node_access table *on the fly* each time a user with some access to the content type. * I could somehow hack the node to hide itself by not displaying any sensitive content in hook_view, and theming it to be hidden via CSS. <rant> IMNHO, this is complete insane. </rant> What's the best way to get around this, um, performance enhancement? Thanks, Rob It may help improve
Rob, I have recently wrestled with this limitation, too, in the development of my node_visibility_bysite module [1]... Rob Thorne wrote:
I'm writing a module for 4.7 that needs to control access to nodes that are only viewable by users who are approved via an external module. To determine if a node is viewable, I need to check the user *and* the node id. This would work, except for a design decision that was made "for performance". Here's what it says in the docs, in the article about "Node access rights" (http://api.drupal.org/api/HEAD/group/node_access):
In node listings, the process above is followed except that hook_access <http://api.drupal.org/api/HEAD/function/hook_access>() is not called on each node for performance reasons and for proper functioning of the pager system. When adding a node listing to your module, be sure to use db_rewrite_sql <http://api.drupal.org/api/HEAD/function/db_rewrite_sql>() to add the appropriate clauses to your query for access checks.
In other words, even if you set up your hook_access to prohibit viewing of your content, Drupal 4.7 *will display your private content to an anonymous user*. Once your private node gets added to the list, there are no further checks to your hook access to determine if your node is safe to display. I don't see any way that hook_db_rewrite_sql can be used for this purpose, since there is no simple relationship between the current user and whether a node should be viewable, short of doing one of the following things:
* I could use a IN () clause to list every node id of the given type that the given user is allowed to see. This may work in my current situation, but there can easily be *thousands* of these in some applications. So this is not a general solution.
[snip] In my case, I ended up with a core hack similar to what you describe here. That is, I did a module-specific conditional extension within db_rewrite_sql. The problem and its solution, I believe, is implicit in your note. There is an underlying presumption in the node_access mechanism that content access conditions are limited to decisions about users and their roles. IOW, _any_ node access decision must be made in the absence of knowledge about the node in question.
<rant> IMNHO, this is complete insane. </rant>
It is an annoyance for sure, but it is probably not completely insane. It is situations like this, however, where I truly miss pure object design (having spent over 20 years as a Smalltalker). I believe the best solution, in your case and mine, is to have a reasoned discussion on how to extend the node access mechanism to be "node-aware." I believe tweaking the parameter spec of this hook mechanism _may_ be all that is needed to allow both user/role and node-state conditional decisions about content visibility. I say "_may_ all that is needed" because there is the possibility that execution-order issues may bite us if we were to rely on a single stage node access mechanism. There may need to be a two-stage mechanism that allows node-state conditions to be applied prior to user/role conditions. Regardless of how best to do it, I agree 100% that we need a more flexible node access hook mechanism. --Sohodojo Jim-- [1] http://lists.drupal.org/archives/development/2006-04/msg00010.html -- Jim Salmons and Timlynn Babitsky Founders and Research Directors Sohodojo - http://sohodojo.com
On Apr 3, 2006, at 3:13 AM, Rob Thorne wrote:
In other words, even if you set up your hook_access to prohibit viewing of your content, Drupal 4.7 *will display your private content to an anonymous user*. Once your private node gets added to the list, there are no further checks to your hook access to determine if your node is safe to display.
IMNHO, this is complete insane.
I'm not sure what the best solution to your problem is, but I think I can help you to understand the reasons behind this decision. Consider a paged listing of nodes. If we are to display the first 10 nodes of 1000 on a site, we call db_query_range() to fetch just those entries. This is fast. Now suppose we use hook_access() to check for access to each of those 10. What if none of those pass the access check? Then you have a page with no nodes printed, even if the next 10 would have passed the check! So what are the possible solutions? One could fetch all of the results rather than a range, and use PHP to iterate through the results and call the function on each until 10 are found. I think that *now* we are in insane territory. Other than that, the only option is to perform the access check within the database call itself. This was the decision that was made.
Jonathan Chaffer wrote:
I'm not sure what the best solution to your problem is, but I think I can help you to understand the reasons behind this decision. Consider a paged listing of nodes. If we are to display the first 10 nodes of 1000 on a site, we call db_query_range() to fetch just those entries. This is fast. Now suppose we use hook_access() to check for access to each of those 10. What if none of those pass the access check? Then you have a page with no nodes printed, even if the next 10 would have passed the check!
JonBob -- I'm sympathetic with the problem, and I agree that calling hook_access on each of the nodes would create a noticeable slow down. But displaying content that is sensitive without checking access at all is a problem, and potentially, it is more serious than occasionally printing irregular numbers of records in a batch. If the information is sufficiently sensitive, even the loss of performance might be a reasonable tradeoff. I'm not suggesting that all applications should be saddled with the security requirements of some applications, but it should be possible to configure the system to behave in a more secure fashion without doing major engineering on the framework.
So what are the possible solutions? One could fetch all of the results rather than a range, and use PHP to iterate through the results and call the function on each until 10 are found. I think that *now* we are in insane territory. Other than that, the only option is to perform the access check within the database call itself. This was the decision that was made.
That I am considering such insanity should tell you how heavily I need to weigh security requirements on some applications :-( It's important to remember that hook_db_rewrite_sql is a mechanism for munging queries. A quick examination of contributions/modules will show that relatively few us has made much use of it, at least successfully. That it has become a central piece of our security architecture should raise flags with at least a few of us. A better question might be how we can make it easier to get hook_access to run efficiently. Hook_access is, afterall, designed for access control :-) Thanks, Rob
Rob Thorne wrote:
But displaying content that is sensitive without checking access at all is a problem, and potentially, it is more serious than occasionally printing irregular numbers of records in a batch. If the information is sufficiently sensitive, even the loss of performance might be a reasonable tradeoff.
That's why node acts on db_rewrite_sql. Security *is* checked.
Earl Miles wrote:
Rob Thorne wrote:
But displaying content that is sensitive without checking access at all is a problem, and potentially, it is more serious than occasionally printing irregular numbers of records in a batch. If the information is sufficiently sensitive, even the loss of performance might be a reasonable tradeoff.
That's why node acts on db_rewrite_sql. Security *is* checked. Strictly speaking, that's only true if there's a reasonable way to put the needed records into node_access. Otherwise, db_rewrite_sql doesn't really have anything to work on. And if there isn't: there's no security for that application either :-(
na_arbitrator does have some promise for what I'm doing; it's probably possible to use your ACL calls to let your system munge node_access for me when the user logs in. And I think that your API is reasonable for that. But let me say it again: friends should not let friends munge node_access, except via na_arbitrartor :-) R
Rob, I'm going to play devel's advocate (pun intended :) for hook_db_rewrite_sql. It's a powerful tool, and a relatively new addition to drupal. Because it's new we're only beginning to see modules use it, but they are growing in numbers. (Before long, some will have better reputations that others, I predict.) I think your problem is more that you haven't found one that does just what you want it to do. That doesn't (necessarily) mean that db_rewrite_sql needs re-working. Regarding nodes: It's tempting to expect hook_access to be called every time. But it's just not realistic. It takes a lot of database calls to load a node. How many depends on the modules you have installed, but either way think of it as a lot. And the node must be loaded for hook_access to be called. Think of all the pages that display, for instance, only a node title. Like "recent posts" or "my issues" on drupal.org. For these pages to load every node they display would just kill performance. Regarding other data: part of the power (and complexity) of hook_db_rewrite_sql is that there is nothing node-specific about it. It can hide or make visible other types of data as well. Want to hide some users from some users? You can do it there. The TAC Lite module, for instance, hides some terms from users. It uses node_access to hide nodes, and db_rewrite_sql to hide the taxonomy terms themselves. I'll attest that hook_db_rewrite_sql is complex. And its a challenge to hide just the right content from just the right users all the time. But I'm a believer. I expect that with a little work you'll find a way to do just what you want. Also, I invite you to explain your application to me in more detail (off this list). I may be able to help, or I may learn just how bad db_rewrite_sql is. ;) -Dave On Monday 03 April 2006 04:46 pm, Rob Thorne wrote:
It's important to remember that hook_db_rewrite_sql is a mechanism for munging queries. A quick examination of contributions/modules will show that relatively few us has made much use of it, at least successfully. That it has become a central piece of our security architecture should raise flags with at least a few of us.
A better question might be how we can make it easier to get hook_access to run efficiently. Hook_access is, afterall, designed for access control :-)
Thanks, Rob
Rob Thorne wrote:
I'm writing a module for 4.7 that needs to control access to nodes that are only viewable by users who are approved via an external module. To determine if a node is viewable, I need to check the user *and* the node id. This would work, except for a design decision that was made "for performance". Here's what it says in the docs, in the article about "Node access rights" (http://api.drupal.org/api/HEAD/group/node_access):
In node listings, the process above is followed except that hook_access <http://api.drupal.org/api/HEAD/function/hook_access>() is not called on each node for performance reasons and for proper functioning of the pager system. When adding a node listing to your module, be sure to use db_rewrite_sql <http://api.drupal.org/api/HEAD/function/db_rewrite_sql>() to add the appropriate clauses to your query for access checks.
In other words, even if you set up your hook_access to prohibit viewing of your content, Drupal 4.7 *will display your private content to an anonymous user*. Once your private node gets added to the list, there are no further checks to your hook access to determine if your node is safe to display. I don't see any way that hook_db_rewrite_sql can be used for this purpose, since there is no simple relationship between the current user and whether a node should be viewable, short of doing one of the following things:
* I could use a IN () clause to list every node id of the given type that the given user is allowed to see. This may work in my current situation, but there can easily be *thousands* of these in some applications. So this is not a general solution. * I could manipulate the node_access table *on the fly* each time a user with some access to the content type. * I could somehow hack the node to hide itself by not displaying any sensitive content in hook_view, and theming it to be hidden via CSS.
<rant> IMNHO, this is complete insane. </rant> Is the ACL stuff I wrote in the na_arbitrator insufficient for this?
Earl Miles wrote:
<rant> IMNHO, this is complete insane. </rant>
Is the ACL stuff I wrote in the na_arbitrator insufficient for this?
Where's it living right now? I noticed it's not in contributions/modules. I'm certainly willing to try it out. Hell, right now I'd try damn near anything! Thanks, Rob
Ah, now I see it. It's at: http://cvs.drupal.org/viewcvs/drupal/contributions/modules/na_arbitrator/ I'll give it a look. Thanks, Rob Rob Thorne wrote:
Earl Miles wrote:
<rant> IMNHO, this is complete insane. </rant>
Is the ACL stuff I wrote in the na_arbitrator insufficient for this?
Where's it living right now? I noticed it's not in contributions/modules.
I'm certainly willing to try it out. Hell, right now I'd try damn near anything!
Thanks, Rob
Rob, You haven't given all the details your user-to-node relationships and how you will store them, but it sounds to me like hook_db_rewrite_sql is exactly what you need. It lets you define join(s) and where clause(s) to hide content. Those clauses can be based on the current user and/or whatever else you come up with. hook_db_rewrite_sql makes the node_access table possible. Node_access is the way the node.module uses the hook, but is not the only way. However, I recommend you use the node_access table if you can make it apply to your situation. If you're planning to make a giant mapping between each user and each node. Then it sounds like a lot of data and an administrative nightmare. But also something that the node_access table is able to represent. I wrote a couple access control modules which might be good examples. One based on users, roles and taxonomy <http://drupal.org/project/issues/53738> and a similar one for the category module <http://cvs.drupal.org/viewcvs/drupal/contributions/modules/category/contrib/cac_lite/>. They both rely on the node_access table mostly, but for special cases use hook_db_rewrite_sql. -Dave On Monday 03 April 2006 12:13 am, Rob Thorne wrote:
I'm writing a module for 4.7 that needs to control access to nodes that are only viewable by users who are approved via an external module. To determine if a node is viewable, I need to check the user *and* the node id. This would work, except for a design decision that was made "for performance".
Dave Cohen wrote:
I wrote a couple access control modules which might be good examples. One based on users, roles and taxonomy <http://drupal.org/project/issues/53738> and a similar one for the category module <http://cvs.drupal.org/viewcvs/drupal/contributions/modules/category/contrib/cac_lite/>. They both rely on the node_access table mostly, but for special cases use hook_db_rewrite_sql.
I look forward to looking at your code related to use of db_rewrite_sql hooks. The cac_lite link led to its code, but the tac_lite module is probably easier to get at with http://drupal.org/node/53738. The node_visibility_bysite module [1] I am working on is closely related to your work but within the context of a multisite, shared content configuration. To get this working, I had to do some core module hacks (minor but necessary). Necessary, that is, based on my need to get this working ASAP compounded by whatever limits I have with regard to how best to leverage the node_access hook mechanism. By looking at your code and mine, I can see if there is a way to eliminate the hacks that I needed to get node_visibility_bysite working. It may even be that cac_lite or tac_lite can be applied to the specific multisite configuration that I am working on. --Sohodojo Jim-- [1] http://lists.drupal.org/archives/development/2006-04/msg00010.html -- Jim Salmons and Timlynn Babitsky Founders and Research Directors Sohodojo - http://sohodojo.com
participants (5)
-
Dave Cohen -
Earl Miles -
Jonathan Chaffer -
Rob Thorne -
Sohodojo Jim