Development
Threads by month
- ----- 2026 -----
- June
- May
- April
- March
- February
- January
- ----- 2025 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
January 2005
- 54 participants
- 341 discussions
Project: Drupal
-Version: 4.5.1
+Version: 4.5.2
Component: comment.module
Category: bug reports
Priority: critical
Assigned to: Junyor
Reported by: pennywit
Updated by: rooey
Status: patch
I hate to be a total baboon - but i'm still seeing a problem after
upgrading to 4.5.2 - the "poll" block has no comment options available
(stats or otherwise).
Have i missed something? (or are these patches not included in the
4.5.2 release?)
Thanks!
rooey
Previous comments:
------------------------------------------------------------------------
October 7, 2004 - 14:23 : pennywit
I have two problems with my upgrade at http://www.pennywit.com. First
is that a number of comments have the body turn to the number 3. It
looks like this happens when one of my users tries to edit his
comments. The second is that in any nodes submitted before I upgraded,
I don't see a count of the comments already submitted. I'm echoing this
to the support forum ...
------------------------------------------------------------------------
October 7, 2004 - 15:01 : pennywit
It says this has been fixed ... what do I need to do on my side?
--|PW|--
------------------------------------------------------------------------
October 8, 2004 - 08:11 : Bèr Kessels
http://drupal.org/node/11316`was the bug. Update commment.module will
fix this problem.
------------------------------------------------------------------------
October 17, 2004 - 12:04 : kps
This problem remains (in 4.5.0-rc 1/2 hour before this post):
... any nodes submitted before I upgraded, I don't see a count of the
comments already submitted.
The {node_comment_statistics} table is evidently not initialized
correctly.
Justification for "critical": Since most people who install 4.5.0 will
be users of older releases, upgrading really ought to work before the
release.
------------------------------------------------------------------------
October 17, 2004 - 12:47 : kps
Attachment: http://drupal.org/files/issues/updc.php (1.26 KB)
Attached, raw and as-is, the script I used to initialize the
{node_comment_statistics} table.
------------------------------------------------------------------------
October 20, 2004 - 17:12 : kps
Attachment: http://drupal.org/files/issues/updatecommentcount.php (907 bytes)
My above attachment (updc.php) is broken. This one is... better.
Visiting this page should at least populate the node_comment_statistics
table with something not entirely unreasonable.
------------------------------------------------------------------------
October 25, 2004 - 07:39 : Junyor
kps is right, the update won't correctly populate the table. Here's
(part of) update_105:
"INSERT INTO {node_comment_statistics} (nid, cid,
last_comment_timestamp, last_comment_name, last_comment_uid,
comment_count) SELECT n.nid, 0, n.created, NULL, n.uid, 0 FROM {node}
n"
So, it's setting the comment_timestamp = node creation time, forgetting
the comment_creator, using the userid that created the node as the
last_comment_uid, and setting the comment count to 0.
update_105 needs to be redone and I'd recommend a new update be created
for 4.5.1 that will correctly initialize the node_comment_statistics
table.
------------------------------------------------------------------------
November 12, 2004 - 09:02 : Junyor
OK, here's two patches to solve the problem.
Issues fixed:
- Fixes node_comment_statistics table prefixing for PostGreSQL
- Drops CID column from node_comment_statistics and supporting code in
comment.module
- Initializes comments table with usernames (needed by
node_comment_statistics table)
- Correctly initializing node_comments_statistics table
- Removed duplicate query for last_comment_name in
node_comment_statistics query in comment.module
Needs checking:
- I couldn't test this with PostGreSQL
- I'm no database expert, so the updates.inc changes need to be gone
over with a fine-toothed comb
- Do we need any special handling for comments with no 'name', i.e.
truly anonymous comments
These patches are for the DRUPAL-4-5-0 branch.
------------------------------------------------------------------------
November 12, 2004 - 12:01 : Junyor
Attachment: http://drupal.org/files/issues/node_comment_stats.patch (7.89 KB)
------------------------------------------------------------------------
November 12, 2004 - 12:01 : Junyor
Attachment: http://drupal.org/files/issues/node_comment_stats2.patch (3.44 KB)
Actually attaching patches. :)
------------------------------------------------------------------------
November 14, 2004 - 20:10 : Dries
I'm not 100% sure but isn't the name field of the comments table
supposed to be NULL, unless the comment is an anonymous comment? Is it
really required to initialize the name field? What happens if you don't?
I just checked drupal.org's database and only post Drupal 4.5.0
comments have the registered user's name in the comment table.
I'm a little nervous about committing database changes to stable
branches (DRUPAL-4-5) so please make sure this patch is well-tested.
The patches don't apply against HEAD but I can port them once we/you
ironed out the last glitches.
Otherwise these patches look fine. Good job.
------------------------------------------------------------------------
November 14, 2004 - 22:41 : Junyor
I don't know if it's necessary, but I'm just making sure it's
consistent. Maybe the author of the node_comment_statistics patch
could comment? I'm also concerned about having the name information
for registered users outside of the users table.
I've tested this on a test site and I'll try testing on my production
site once we have this worked out.
------------------------------------------------------------------------
November 15, 2004 - 09:45 : Dries
I don't know whether the original author (ccourtne) is still around but
it should be easy enough to test whether it is necessary or not. From
what I've seen, it isn't necessary. In fact, your current patch might
break things: like, what happens when somone changes his or her
username? AFAIK, the old username would be shown in the comments.
------------------------------------------------------------------------
November 15, 2004 - 10:03 : Junyor
OK, I'll edit that bit and resubmit. You'd like this for HEAD only?
------------------------------------------------------------------------
November 15, 2004 - 17:24 : Junyor
Attachment: http://drupal.org/files/issues/node_comment_stats-2.patch (7.24 KB)
New patch for database files.
------------------------------------------------------------------------
November 15, 2004 - 17:25 : Junyor
Attachment: http://drupal.org/files/issues/node_comment_stats2-2.patch (3.46 KB)
New patch for comment.module.
------------------------------------------------------------------------
November 22, 2004 - 10:31 : Junyor
Attachment: http://drupal.org/files/issues/node-comment-statistics_head.patch (7.26 KB)
Patches for head.
------------------------------------------------------------------------
November 22, 2004 - 10:31 : Junyor
Attachment: http://drupal.org/files/issues/node-comment-statistics_head2.patch (3.45 KB)
------------------------------------------------------------------------
November 23, 2004 - 22:24 : Dries
I'd like to move forward with this patch and include it in Drupal 4.5.1.
I can't reproduce this problem (it seems) so it would be much
appreciated if those who can, can test it.
------------------------------------------------------------------------
November 28, 2004 - 23:42 : Junyor
It applied and works well on my 4.5.0 site. A lot of nodes were listed
as not having comments before, but they're working now. Dries alerted
me to a drupal-support message regarding the patch and I'll look into
that tomorrow.
------------------------------------------------------------------------
December 8, 2004 - 05:12 : crw
After applying the patch, approving new comments still does not update
the node_comment_statistics table, therefore the comment_count field is
off and the number of new comments does not show up on the main page of
my site.
I'm currently troubleshooting this and will report my findings here.
------------------------------------------------------------------------
December 8, 2004 - 05:13 : crw
I should point out that I applied the following patch:
node-comment-statistics_head2.patch
And the problem described above still exists.
------------------------------------------------------------------------
December 8, 2004 - 05:25 : crw
Ok, headway.
Looks like submitting a comment triggers an updating of
node_comment_statistics, but editing a comment to publish/unpublish
does not. Both should trigger the update, so I just need to find out
why it isn't working for the 'update' case.
------------------------------------------------------------------------
December 8, 2004 - 06:22 : crw
Ok, I found the problem and came up with a solution. I've been manually
approving comments from anonymous users via the admin interface.
comment_admin_edit() calls comment_save(), but neither calls
_comment_update_node_statistics. I inserted the following two lines
around line 952 of comment.module after the call to comment_save():
$nid = db_result(db_query('SELECT nid FROM {comments} WHERE cid =
%d',$edit['cid']));
_comment_update_node_statistics($nid);
Ideally, there'd be tests for all these. comment_save should produce a
return value, and comment_admin_edit() shouldn't go forward unless
comment_save returns true.
Please excuse my lack of diff/patch-fu. :)
------------------------------------------------------------------------
December 8, 2004 - 08:58 : Junyor
Bah, that function should call comment_save(). All comment changes
should go through comment_save(). It would make life so much easier.
I'll try to come up with an updated patch soon, but probably not before
this weekend.
------------------------------------------------------------------------
December 8, 2004 - 09:28 : Dries
I agree that all comment editing should go through comment_save(), yet
that will require a bit of refactoring. Also, there are two 'edit
comment' forms (one for users, one for administrators) that should be
merged, much like we merged 'node edit' forms. Anyone?
------------------------------------------------------------------------
December 11, 2004 - 16:19 : Junyor
Dries: I see that you made a change to the updating of
node_comment_statistics in CVS HEAD. Do I still need to add an updated
patch? Are there plans for a 4.5.2 that could use this patch?
------------------------------------------------------------------------
December 11, 2004 - 17:14 : ax
i tried upgrading a 4.4 site to 4.5 yesterday and fell into the same
trap as pennywit, kps, junyor, and others: the update for the
node_comment_statistics table only updates forum comments and inserts
num_comments 0 for all other node types. quite cheaty, this.
junyors 4.5 patches (node_comment_stats-2.patch,
node_comment_stats2-2.patch) seem to solve the problem for me. at
least, i see proper "replies" counts in tracker and node views now. i
didn't try approving / publishing / unpublishing comments, though, nor
did i check other issues mentioned in the thread (user names,
postgresql, ...).
i applied the 2 patches to a 4.4 database / 4.5 code both before
running update.php and afterwards. in the first case, there is a small
glitch in that update.php throws an error:
user error: Unknown table 'node_comment_statistics'
DROP TABLE {node_comment_statistics}
because the patch removes "CREATE TABLE {node_comment_statistics}" from
update_105. this could be catched by changing to "DROP TABLE IF EXISTS
{node_comment_statistics}".
this is definitely a critical bug, and these patches should be applied
to 4.5 as soon as possible, regardless of the "all comment editing
should go through comment_save()" issue.
------------------------------------------------------------------------
December 11, 2004 - 19:53 : Jeremy(a)kerneltrap.org
I've just run into this same bug, trying to upgrade from 4.4 to 4.5.
It's a show stopper for me. :( Ugh, so close.
------------------------------------------------------------------------
December 11, 2004 - 22:29 : Junyor
Attachment: http://drupal.org/files/issues/node_comment_stats-3.patch (7.25 KB)
Changed patch for updates.inc based on feedback from ax. For 4.5.
------------------------------------------------------------------------
December 11, 2004 - 22:53 : Junyor
Attachment: http://drupal.org/files/issues/node_comment_patch2-3.patch (4.2 KB)
And an update for comment.module with additions to comment_save to
update the node_comment_statistics table. This should cover all edit
cases. For 4.5. Untested.
------------------------------------------------------------------------
December 12, 2004 - 17:40 : Jeremy(a)kerneltrap.org
I tested the upgrade process on KernelTrap -- worked perfectly nearest I
could tell. I'm also running with your comment.module patch.
------------------------------------------------------------------------
December 15, 2004 - 13:37 : Junyor
What needs to be done for this to be included in core?
------------------------------------------------------------------------
December 15, 2004 - 14:26 : Jeremy(a)kerneltrap.org
FWIW: In the days following my upgrade, I've found numerous issues with
comment statistics. Specifically, many old postings list "x new of x",
but clicking "x new" shows that in fact none of them were new.
I have not figured out what's different about the nodes for which this
is a problem compared to the nodes for which this isn't a problem. ie,
it doesn't seem to be a problem with a specific node type, etc.
In other words: I retract my earlier comment that the upgrade went
perfectly with this patch applied. Sorry I don't have more specifics
to offer, perhaps I'll have more time at a future date to help dig into
this more. In any case, the patch did improve the process noticeably.
------------------------------------------------------------------------
December 15, 2004 - 14:31 : Junyor
So, none of the comments said they were new or you had read them
previously despite them being marked new?
------------------------------------------------------------------------
December 15, 2004 - 15:13 : Jeremy(a)kerneltrap.org
> or you had read them previously despite them being marked new?
Right. Comments I had already read when running 4.4 were suddenly
marked as new after the upgrade to 4.5.
------------------------------------------------------------------------
December 15, 2004 - 16:50 : Junyor
That sounds like a problem with history more than
node_comment_statistics. I'm pretty sure this code doesn't do anything
related to marking comments as new or old.
------------------------------------------------------------------------
December 15, 2004 - 20:33 : Dries
If possible provide a single patch against DRUPAL-4-5 and a second patch
against HEAD. Looks like the patches are no longer in sync.
------------------------------------------------------------------------
December 16, 2004 - 09:57 : Junyor
I will do so soon, probably this weekend.
------------------------------------------------------------------------
December 18, 2004 - 00:42 : Jonathan Furness
Ah.... that's why I am struggling with my upgrade from 4.4.2 to
4.5.1..... thank goodness I found this page.
Is there a case for building a script which looks for all the records
in the comments table and updates the node_comment_statistics table...
for those of us who are now working in 4.5.1 with inaccurate records in
node_comment_statistics table?
Looking forward to the fix....
Jonathan
==========
Jonathan Furness
www.jonathansblog.net
------------------------------------------------------------------------
December 18, 2004 - 09:48 : Junyor
Attachment: http://drupal.org/files/issues/node_comment_stats_4-5.patch (11.45 KB)
Updating patch to current 4.5.1.
Jonathan: Just apply this patch to your installation (from the main
Drupal directory, type 'patch -p0 -u <
../node_comment_stats_4-5.patch') and run update.php.
------------------------------------------------------------------------
December 18, 2004 - 09:59 : Junyor
Dries: The comment.module in HEAD makes use of the cid column in
node_comment_statistics. My patches were removing that column, as it
wasn't being used. Should I leave it alone afterall?
------------------------------------------------------------------------
December 20, 2004 - 06:23 : Dries
The column seems to get populated though (it's not empty). If it the
data is not used, you can remove it.
mysql> SELECT COUNT(cid) FROM node_comment_statistics WHERE cid != 0;
+------------+
| COUNT(cid) |
+------------+
| 3970 |
+------------+
------------------------------------------------------------------------
December 20, 2004 - 08:27 : Anonymous
Well, cid wasn't used until the change you made in revision 1.314 of
comment.module. It was never set correctly or used anywhere in the
code, so I simply removed it.
------------------------------------------------------------------------
December 20, 2004 - 09:40 : Anonymous
Attachment: http://drupal.org/files/issues/node_comment_stats_4-5_0.patch (11.64 KB)
Slightly updated patch for 4.5.1.
------------------------------------------------------------------------
December 20, 2004 - 09:41 : Junyor
Attachment: http://drupal.org/files/issues/node_comment_stats_head.patch (12.08 KB)
And patch for HEAD (with all CID stuff removed).
------------------------------------------------------------------------
January 4, 2005 - 09:04 : Junyor
Dries: Is there anything else you need here before its commitable?
------------------------------------------------------------------------
January 4, 2005 - 14:31 : ax
i too would much appreciate if this *release* *critical* bug could be
fixed soon. thanks a lot!
------------------------------------------------------------------------
January 4, 2005 - 22:59 : Dries
I look into this patch tomorrow (err, later today).
------------------------------------------------------------------------
January 5, 2005 - 20:37 : Dries
I committed the patch to DRUPAL-4-5. I'm putting the HEAD version on
hold until the revisions patch landed.
Please upgrade your Drupal 4.5 sites to DRUPAL-4-5 in preparation of
Drupal 4.5.2. Thanks.
--
View: http://drupal.org/node/11366
Edit: http://drupal.org/project/comments/add/11366
1
0
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: chx
Status: patch
Attachment: http://drupal.org/files/issues/base_url_kill_0.patch (4.75 KB)
This one does not use the whole base_url only the path part of it. HTML
bloat is kept at minimal.
chx
Previous comments:
------------------------------------------------------------------------
November 19, 2004 - 04:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 20, 2004 - 05:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 20, 2004 - 05:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 10:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 11:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 21:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 17:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 17:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 18:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 22, 2004 - 01:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
[?php $base = $parts['path'] . '/' ; ?]
Replace that by:
[?php $base = ( $path == '/' ? $base : $parts['path'] . '/' ); ?]
------------------------------------------------------------------------
November 28, 2004 - 05:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 10:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 19:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 20:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 29, 2004 - 00:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 21:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 22:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 23:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 16:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
------------------------------------------------------------------------
January 17, 2005 - 14:00 : chx
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
------------------------------------------------------------------------
January 17, 2005 - 14:09 : Goba
I don't think that removing from the themes is a good idea, using
$parts['path'] should be encouraged though before the files, which
would fix the google cache problem, and would still keep the HTML size
low. It would also help those, who save the file to find the
originating site easier, since clicking on a non-pagelocal link would
lead to the online version.
------------------------------------------------------------------------
January 17, 2005 - 17:03 : Steven
Definitely -1 on removing the <base> tag or using absolute or
root-relative URLs. This tag has been around for ages, and it is the
only way to make clean URLs work without bloating in the code. FYI,
"base" is (first?) mentioned in Berners-Lee's HTML 1.0 draft [1].
That's June 1993.
As the amount of clean URL-using sites grows, the crawlers will have to
be updated. Perhaps we could prevent crawlers from going too insane by
404ing for URLs with more than say 10 components? That would prevent
the really crappy ones from hammering your site.
I'm all for making the <base> tag easier to handle for the user (say,
by including a filter to allow simple anchor links to work as most
people expect them to), but we should keep Drupal-generated URLs clean
and completely relative.
[1] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
------------------------------------------------------------------------
January 17, 2005 - 17:57 : kbahey
The problem with css is this: The @import argument does not start with a
/.
This is simple to fix.
We keep the "base" as it is today, but add the new variable: $base
before it.
So for a site where Drupal is installed in the DocumentRoot, all that
will change is that /misc/drupal.css and /themes/themename/style.css
will be preceded by a slash. For sites that use another path, that path
will be prepended to the css file name.
How about that?
------------------------------------------------------------------------
January 17, 2005 - 18:38 : Steven
What exactly is the problem with the @import? As far as I know:
- url() in stylesheets is interpreted relative to the base of the
stylesheet, not the source document.
- However, if the styles are inside an HTML document, through a style
tag or style attribute, then the stylesheet's location is the same as
the HTML document.
- Thus, the stylesheet's base is the same as the base of the HTML
document (which can be altered through the <base> tag).
I just don't see why it is necessary. As far as I know, the only
browser that has had problems resolving CSS urls properly was Netscape
4, which does not support @import at all, and which Drupal does not
support either, because of its CSS usage.
------------------------------------------------------------------------
January 17, 2005 - 19:35 : kbahey
The problem for stylesheets is as follows. I think it mainly affect
crawlers and Google's cache.
Say you have an installtion of Drupal in DocumentRoot. You then use url
aliases, and put slashes in them.
For example, you use news/general/2004-12-15.html for a node.
That node still has misc/drupal.css and themes/pushbutton/style.css in
the head section if the document. Crawlers get fooled by that and try
to look for /news/general/misc/drupal.css and
/news/general/themes/pushbutton/style.css, which don't exist.
So, just prepending the new $base variable (in chx's patch) before the
stylesheet @import argument would fix this issue. Assuming you are in
DocumentRoot, then /misc and /themes would be used instead of just misc
and themes.
It would still be compliant with standards, be relative to the web
site, and no ambiguous to anyone, be they crawler or browser.
I hope it is clearer now.
I think chx can change the patch to use the $base instead of $base_url
everywhere, so as to avoid the host/domain name in the urls.
------------------------------------------------------------------------
January 17, 2005 - 22:36 : Steven
But typical crawlers don't even pay attention to stylesheets, hence it
wouldn't have much use for them. I just don't see why we should adjust
to rare cases of buggy software. Reading out a base URL from an HTML
document is dead easy, and on top of that it doesn't add more
complexity as without the base tag, the document's URL is already an
implicit base which has to be parsed anyway.
I did not like it when we altered the <link> tag to accomodate buggy
RSS readers and I certainly don't like it now, as this is even rarer.
In both cases, it is not Drupal which is at fault.
------------------------------------------------------------------------
January 17, 2005 - 22:48 : kbahey
Steven
While I agree with most of what you said, the 404s show up in the logs
enough to be a bother.
Perhaps the original design of Drupal did not forsee that people will
use url aliases to mimic directory/file hierarchies. Whether this was
intended or not, it is the way many use Drupal today.
It does not matter where the bug is (Drupal or the external world), as
long as we can stop it ourselves, by adjusting our end of it.
The fix is simple enough and does not break standards (if implemented
as described with a leading / before the css).
------------------------------------------------------------------------
January 17, 2005 - 22:54 : Steven
It does not break standards, but it does bloat the code in an ugly way.
Why not send an e-mail to the owners of the crawlers and tell them to
implement a standard that is nearly 10 years old [2] (RFC 1808)?
Note that Google Cache now seems to correctly interpret base URLs [3]
and even adds a <base> tag of its own.
By the way, this problem has nothing to do with people using URL
aliases or not, as for a browser the regular nested paths that Drupal
uses (e.g. "node/1" is no different from aliases mimicking files
"foo/bar.html").
[2] http://www.faqs.org/rfcs/rfc1808.html
[3]
http://www.google.be/search?q=cache%3Awww.drupal.org&sourceid=mozilla-searc…
------------------------------------------------------------------------
January 17, 2005 - 23:34 : Goba
Steven, part of the problem is that Google cache does add a base href
even if there is a base href in the document. Eg adds a on the plone
comparision page cached. Now that since HTML does not allow more than
one base tag [4] to be present, it is up to the browsers, to use the
first or the last base value, or any of the base values on the page for
that matter as the used base. So even pages displayed from the google
cache will be buggy if a full relative path to the domain root is not
specified, due to this problem.
[4]
http://www.w3.org/TR/1999/REC-html401-19991224/sgml/dtd.html#head.content
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
[drupal-devel] [bug] new revisions not created if user doesn't have administer nodes permission
by wazdog 18 Jan '05
by wazdog 18 Jan '05
18 Jan '05
Project: Drupal
Version: 4.5.0
Component: node system
Category: bug reports
Priority: critical
Assigned to: wazdog
Reported by: wazdog
Updated by: wazdog
Status: patch
Ultimately, an "edit" permission would encompass the permission needed
in a "rollback revision" permission, as the latter is a subset of the
former.
Therefore, if a user can edit a node, they should be able to rollback a
revision (though deleting a revision should still be a different
permission, because it is destructive).
wazdog
Previous comments:
------------------------------------------------------------------------
November 1, 2004 - 17:33 : wazdog
If you enable revisions for a node type via content > configure >
default workflow, and the user who is editing a node of that type
doesn't have the administer nodes permission, then a new revision is
not created.
I see this a huge bug. I am trying to have all edits to a node type
saved as revisions (in this case a flexinode, but I also tested with
blog types). I want to recreate the book pages, but with more fields
(hence, flexinode).
But if one of my authenticated users creates a node, and then later
edits it, a revision is not made. However, if an admin or moderator
(who both have the administer nodes permission) edits the node, a new
revision is created. I don't want my regular users to have administer
nodes permission, that gives them too much power, but their revisions
need to be kept (just like I give them all publish permission via
default workflow).
If this is supposed to be how it works, what is the point of the
default workflow setting? (and why do the other default workflow
settings behave differently? i.e. they all apply no matter the users
permissions)
------------------------------------------------------------------------
November 3, 2004 - 09:48 : wazdog
Attachment: http://drupal.org/files/issues/node-rev.patch (721 bytes)
Okay, here's a short little patch. It seems like node_create_revision()
was being called before $node->revision was set. Since $node->revision
is what tells Drupal whether or not to make a new revision, this was
causing nodes to not keep revisions. I moved node_create_revision() to
after this check.
I'm not sure exactly why only regular users without 'administer nodes'
permission were affected, but that seems the case.
Now, if a particular node type has 'revision' checked in content >
default workflow, a new revision is always created, no matter who is
doing the editing...
I'm setting this to critical because I think the current behavior goes
against logic, and what admins may expect. And if they don't notice
it's not working like they expect (i.e. revisions aren't being kept
always), then content is lost...
The patch is against 4.5.0, but always applies to current cvs (with a
slight offset error).
------------------------------------------------------------------------
January 15, 2005 - 05:54 : mcd
I tried this patch against 4.5.1 and it worked, but it took me some time
to figure out that it was working because it's still not doing the
intuitive thing. The user still can't see the revisions tab unless
they have node administration privileges (which makes it unlike the
outline tab). They can't see the checkbox to decide whether their edit
is a new revision, so every edit they do becomes a revision from which
they derive no benefit.
I would let them see the revision checkbox and the revisions tab, and
also roll back (non-destructively as suggested in
http://drupal.org/node/12479)
I don't think there is an intuitive way to handle the five node
administration Options checkboxes as a group while appearing to enable
them separately in the default workflow. I've had similar issues with
the defaults, like a page I made sticky coming unstuck every time the
owner edited it (with the default non-sticky in the workflow). Now I
see why that was happening, but it's still highly counterintuitive.
------------------------------------------------------------------------
January 15, 2005 - 05:56 : mcd
Sorry, I didn't mean to change the main title.
------------------------------------------------------------------------
January 15, 2005 - 06:11 : chx
IMHO in some situations it is a good thing if a new revision is created
every time a user edits the node. It's not her privilege to roll back,
it's something a moderator should do. It'd be even better if the new
revision would not be automatically published only after approval by
the moderator.
------------------------------------------------------------------------
January 15, 2005 - 06:18 : mcd
Rolling back nondestructively is just a means of editing a page. Why
should a user with permission to edit be prevented from that edit?
They can certainly do more destructive things.
As for creating new revisions every time, it depends on usage patterns.
You may not want too many almost indistinguishable revisions collecting
in the revisions tab.
------------------------------------------------------------------------
January 15, 2005 - 07:32 : robertDouglass
My opinions:
1. administer nodes is the wrong permission to determine revisions
behavior. At least one revision-specific permission is necessary,
perhaps two or more (view, create, delete, rollback revisions).
2. If the default workflow says revisions are made, this should
definitely make it so that normal users create revisions every time.
3. Users with admin rights (either a revisions specific permission or
administer nodes) can override the default. All others should not see
the checkbox.
4. Whether revisions are rolled back destructively or non-destructively
*could* be a setting, but if not, should be non-destructive by nature as
we already have a mechanism for deleting revisions.
-----
That much is the minimum I consider necessary to be a useable system. I
would additionally wish for the following:
5. Since I would like to have a site where normal users can view any of
the revisions, but not delete and not roll back, I would really like to
see the following permissions: read revision, rollback revision, delete
revision
6. The ability to compare two or more revisions [1]
7. The possibility for moderation of revisions ie. people with
administer nodes (or moderate revisions) look at a queue of revisions
and rollbacks and approve or disapprove them.
8. The possibility to vote on revisions. 7 could be an extension of 8.
If only administrators can vote on revisions, it would be the same as
moderating them.
I know the last bits sounds like a bunch of pipe dreams, and may be off
topic for this thread, but the system I imagine would be for a site
where the wording of entries must be considered very carefully by many
people and a concensus about the best version be found. The above
features would achieve this.
[1] http://drupal.org/node/15596
--
View: http://drupal.org/node/12401
Edit: http://drupal.org/project/comments/add/12401
1
0
18 Jan '05
Project: Drupal
Version: cvs
Component: node system
-Category: feature requests
+Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: darrian
Updated by: darrian
Status: patch
I switched this to bug report so maybe someone would look at it again.
Forgetting about the changes i've made to the event module, can
someone look at this patch to see if it's suitable to be put into
core?
I've looked at this (node.module, node_delete) code, and code that uses
this code and I cannot see anything wrong with my patch. It doesnt
affect the way anything works, it doesn't affect security, its very
small.
The only thing this patch does is increase the flexibilty of drupal and
gives more options to module developers.
Thanks,
Darrian
darrian
Previous comments:
------------------------------------------------------------------------
June 18, 2004 - 21:07 : darrian
Attachment: http://drupal.org/files/issues/node.module-cvs.patch (872 bytes)
When you delete a node, it asks for confirmation, after that step all
extra fields in $node get lost because $node is recreated with only 2
fields, nid and confirm. I created a small patch that adds extra
fields back into the $node variable so they get passed to subsequent
_delete calls.
------------------------------------------------------------------------
June 27, 2004 - 20:29 : Dries
AFAIK, deleting a node works fine. How can we reproduce your problem?
Not sure what you are trying to accomplish but this patch looks wrong.
------------------------------------------------------------------------
July 12, 2004 - 23:05 : darrian
Right, this patch is so the modification I made to event.module works.
I added functionality for repeating events to event.module, and in
order for the delete to work I had to make this small change outside of
event.module (believe me I really tried not to). What this patch does
is pass variables originally passed to the delete function to whats
called after the user clicks confirm.
In other words when a user clicks delete the first thing they get is a
prompt to confirm deletion, after the user clicks yes, the delete
function is called, but the only variables that are passed are nid and
confirmed, all other variables are lost, which is usually ok because no
other vars needed for deletion, however in my case I needed another
variable.
This patch has no effect on all current modules, the reason i made this
a bug report is because it seems to me like variables originally passed
to node_delete should still exist after the user clicks confirm, even
though current modules dont use other vars.
I made a reference to this bug from this bug:
http://drupal.org/node/view/4122
If the changes I made to event.module ever get put into the current cvs
version, then this patch also needs to get put in.
-Darrian
------------------------------------------------------------------------
July 13, 2004 - 07:28 : Dries
When an event (node) is deleted, the _delete hook is called and the
event (node) is passed as a parameter. That seems to work for all
other node modules/types so I don't understand why this shouldn't work
for the event module. What is so special about this particular field
or node type that this does not suffice?
------------------------------------------------------------------------
July 14, 2004 - 14:14 : darrian
$node is passed, but after the user clicks confirm, $node only has two
fields, nid and confirm. I need an extra field, which is whether I'm
supposed to delete only this event, or all events in the group. This
is for the reocurring events feature I added. My modification adds the
extra fields originally in $node back to $node after the user clicks
confirm.
-Darrian
------------------------------------------------------------------------
August 2, 2004 - 16:27 : darrian
In order for my repeating/reocurring event module patch to get submitted
to cvs, this very small patch needs to make it into core. It does not
effect the functionality of anything current.
Here's the details of what happens, >>> is what happens when my node
patch is there:
1. User is looking at the form to delete an existing event, there are a
bunch of fields like date/time title, etc. This holds true for any node,
but I am going to reference an event node.
2. User clicks delete to delete the node being looked at. (User checks
to delete all events in this group - from my event patch)
3. All the fields for that node are a form element at this point, some
hidden, like nid, and a variable I set telling the event_delete
function to delete all events in this group.
4. The code makes its way to the node_delete($edit) function. At this
point all the form elements from above are members of the $edit
variable, including my variable mentioned above.
5. The code checks to see if the user already clicked confirm, at this
point they haven't.
6. Now we're in the else branch that creates only two form elements,
the nid, and confirm. This has been ok in the past, because the only
thing that you needed to delete a node was its nid. But in my case I
want to delete multiple nodes, so this doesnt work.
>>> My node patch creates hidden form elements of all the previous
variables, not just nid and confirm.
7. The code makes it way back to node_delete, this time however there
are only two members of $edit, nid and confirm.
8. The code checks to see if the user has clicked confirm, and they
have.
9. Now we're in the true branch which deletes the core node components,
then calls the node specific delete function, in our case the
event_delete function, passing only the nid and confirm variables.
>>> My node patch makes it so all variables, including my varible to
check whether we're deleting all nodes or not, are passed back in.
10. Now we're in the event_delete function. The event delete function
checks for the variable to see if we're deleting all events in group,
it doesnt exist, so it acts as if the user didn't check to delete all
events, and deletes just this event.
>>> With my patch, all previous variables still exist, so event_delete
knows that it needs to delete multiple nodes.
Hopefully this patch gets into core before feature freeze for the sake
of the event module. If it doesnt, then my patch for
repeating/reocurring events also cannot make it into cvs.
Thanks, Darrian
------------------------------------------------------------------------
August 2, 2004 - 17:29 : moshe weitzman
the patch looks good to me. thanks for the clear description of the bug.
------------------------------------------------------------------------
August 13, 2004 - 08:03 : TDobes
+1, except the single comment prefixed with a # should be changed to //
------------------------------------------------------------------------
August 16, 2004 - 21:57 : darrian
Does the code freeze mean that this patch wont make it into 4.5?
-Darrian
------------------------------------------------------------------------
August 16, 2004 - 22:31 : Steven
This sounds like a bugfix or at least code fix, so it can still go in.
I'm wondering about your description though. It sounds that if I first
edit a couple of fields, then click delete, then my edited variables
are passed around rather than the original ones. Because I did not
click 'submit', this is not desirable.
It may be better to call node_load() before deleting the node, so the
correct, original data is passed to the nodeapi.
------------------------------------------------------------------------
August 17, 2004 - 01:09 : darrian
Remember, for node_delete, only nid and confirm are passed.
But I already do as you suggest, to be more specific:
node_load is called, those variables, along with extra fields that are
not loaded with node_load (such as my variable for deleting group
events), are passed to subsequent _delete calls.
Just to re-iterate, without this patch the ONLY variables that make it
to the final delete call are nid and confirm.
-Darrian
------------------------------------------------------------------------
August 17, 2004 - 01:55 : JonBob
The point is that (efficiency aside) if you have the node ID, you can
reconstruct the node with a simple node_load() call. If your delete
function is called with only $nid, you can use that to load the whole
node again if need be.
Now this may not be enough in your case, as you are presenting
additional interface to the user that adds fields to $edit that are not
part of the node, but all the parts *of the node* are certainly
available to you.
------------------------------------------------------------------------
September 5, 2004 - 17:47 : Steven
"The point is that (efficiency aside) if you have the node ID, you can
reconstruct the node with a simple node_load() call. If your delete
function is called with only $nid, you can use that to load the whole
node again if need be."
At the point where hook_delete gets called, the node itself has been
deleted already, so reconstruction is not an option. The point here
seems to be to pass variables from the editing form to the delete
operation. I'm not sure if this is a good idea.
If it's not about this, then maybe we should invoke deletes in the
opposite order: nodeapi, hook, node.module.
------------------------------------------------------------------------
September 9, 2004 - 23:05 : darrian
Right, the point is to get variables from the form passed to the delete
function, namely 1 variable, which tells me if I should delete 1 event,
or all grouped reoccuring events, that I need for the modification I
made to the event module.
The way I made the modification is very UN-obtrusive. No variables get
squashed, and everything current ignores the new variables.
Also, the variable that I want exists in the node_delete function the
first time around, its only after the user clicks confirm that all
variables except the nid are lost. So I could delete things with my
modified event module if i skip the confirm step, but I'm sure that
would bite someone down the road and I really dont want to do it that
way.
-Darrian
------------------------------------------------------------------------
October 4, 2004 - 21:06 : Steven
"Right, the point is to get variables from the form passed to the delete
function, namely 1 variable, which tells me if I should delete 1 event,
or all grouped reoccuring events, that I need for the modification I
made to the event module."
This sounds like really bad UI: the rest of the node form is about data
related to the node, which gets saved when you click "submit". Adding a
delete-related checkbox there sounds odd and confusing.
------------------------------------------------------------------------
October 8, 2004 - 21:58 : darrian
Attachment: http://drupal.org/files/issues/editevent.png (35.72 KB)
Here's the UI i'm referring to. This patch to node.module is extremely
small and unobtrusive... Why so much resistance to it?
------------------------------------------------------------------------
January 12, 2005 - 22:47 : darrian
Attachment: http://drupal.org/files/issues/node.module-4.5.1.patch (822 bytes)
Here's an updated patch for drupal 4.5.1
-Darrian
------------------------------------------------------------------------
January 13, 2005 - 00:28 : tangent
Is a recurring event a single node or a group of nodes? Without getting
into the appropriate implementation of that, your bug suggests that the
checkbox will allow multiple nodes to be deleted. This is why it has
been suggested that it is a confusing UI. It isn't necessarily clear
what the "group" of nodes is?
FWIW, if a recurring event is a single node (which is displayed in
multiple places on the calendar) then this would not be an issue.
------------------------------------------------------------------------
January 14, 2005 - 23:55 : darrian
A group is in fact multiple nodes. I did it this way for a few reasons,
one of them being other modules (such as remindme) work with it better
this way.
I am no longer working on development of this feature, so I am not
going to redesign the way its done. I only try and keep it updated so
other people that need this feature can use it, seeing as there is no
other alternative.
As for this particular patch to node.module I still think it is a bug
because variables available the first time any _delete function is
called dissappear after the confirm step. Those variables should be
available to module developers regardless of how they are used.
-Darrian
--
View: http://drupal.org/node/8611
Edit: http://drupal.org/project/comments/add/8611
1
0
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: Goba
Status: patch
Steven, part of the problem is that Google cache does add a base href
even if there is a base href in the document. Eg adds a on the plone
comparision page cached. Now that since HTML does not allow more than
one base tag [1] to be present, it is up to the browsers, to use the
first or the last base value, or any of the base values on the page for
that matter as the used base. So even pages displayed from the google
cache will be buggy if a full relative path to the domain root is not
specified, due to this problem.
[1]
http://www.w3.org/TR/1999/REC-html401-19991224/sgml/dtd.html#head.content
Goba
Previous comments:
------------------------------------------------------------------------
November 19, 2004 - 04:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 20, 2004 - 05:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 20, 2004 - 05:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 10:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 11:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 21:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 17:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 17:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 18:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 22, 2004 - 01:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
[?php $base = $parts['path'] . '/' ; ?]
Replace that by:
[?php $base = ( $path == '/' ? $base : $parts['path'] . '/' ); ?]
------------------------------------------------------------------------
November 28, 2004 - 05:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 10:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 19:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 20:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 29, 2004 - 00:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 21:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 22:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 23:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 16:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
------------------------------------------------------------------------
January 17, 2005 - 14:00 : chx
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
------------------------------------------------------------------------
January 17, 2005 - 14:09 : Goba
I don't think that removing from the themes is a good idea, using
$parts['path'] should be encouraged though before the files, which
would fix the google cache problem, and would still keep the HTML size
low. It would also help those, who save the file to find the
originating site easier, since clicking on a non-pagelocal link would
lead to the online version.
------------------------------------------------------------------------
January 17, 2005 - 17:03 : Steven
Definitely -1 on removing the <base> tag or using absolute or
root-relative URLs. This tag has been around for ages, and it is the
only way to make clean URLs work without bloating in the code. FYI,
"base" is (first?) mentioned in Berners-Lee's HTML 1.0 draft [2].
That's June 1993.
As the amount of clean URL-using sites grows, the crawlers will have to
be updated. Perhaps we could prevent crawlers from going too insane by
404ing for URLs with more than say 10 components? That would prevent
the really crappy ones from hammering your site.
I'm all for making the <base> tag easier to handle for the user (say,
by including a filter to allow simple anchor links to work as most
people expect them to), but we should keep Drupal-generated URLs clean
and completely relative.
[2] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
------------------------------------------------------------------------
January 17, 2005 - 17:57 : kbahey
The problem with css is this: The @import argument does not start with a
/.
This is simple to fix.
We keep the "base" as it is today, but add the new variable: $base
before it.
So for a site where Drupal is installed in the DocumentRoot, all that
will change is that /misc/drupal.css and /themes/themename/style.css
will be preceded by a slash. For sites that use another path, that path
will be prepended to the css file name.
How about that?
------------------------------------------------------------------------
January 17, 2005 - 18:38 : Steven
What exactly is the problem with the @import? As far as I know:
- url() in stylesheets is interpreted relative to the base of the
stylesheet, not the source document.
- However, if the styles are inside an HTML document, through a style
tag or style attribute, then the stylesheet's location is the same as
the HTML document.
- Thus, the stylesheet's base is the same as the base of the HTML
document (which can be altered through the <base> tag).
I just don't see why it is necessary. As far as I know, the only
browser that has had problems resolving CSS urls properly was Netscape
4, which does not support @import at all, and which Drupal does not
support either, because of its CSS usage.
------------------------------------------------------------------------
January 17, 2005 - 19:35 : kbahey
The problem for stylesheets is as follows. I think it mainly affect
crawlers and Google's cache.
Say you have an installtion of Drupal in DocumentRoot. You then use url
aliases, and put slashes in them.
For example, you use news/general/2004-12-15.html for a node.
That node still has misc/drupal.css and themes/pushbutton/style.css in
the head section if the document. Crawlers get fooled by that and try
to look for /news/general/misc/drupal.css and
/news/general/themes/pushbutton/style.css, which don't exist.
So, just prepending the new $base variable (in chx's patch) before the
stylesheet @import argument would fix this issue. Assuming you are in
DocumentRoot, then /misc and /themes would be used instead of just misc
and themes.
It would still be compliant with standards, be relative to the web
site, and no ambiguous to anyone, be they crawler or browser.
I hope it is clearer now.
I think chx can change the patch to use the $base instead of $base_url
everywhere, so as to avoid the host/domain name in the urls.
------------------------------------------------------------------------
January 17, 2005 - 22:36 : Steven
But typical crawlers don't even pay attention to stylesheets, hence it
wouldn't have much use for them. I just don't see why we should adjust
to rare cases of buggy software. Reading out a base URL from an HTML
document is dead easy, and on top of that it doesn't add more
complexity as without the base tag, the document's URL is already an
implicit base which has to be parsed anyway.
I did not like it when we altered the <link> tag to accomodate buggy
RSS readers and I certainly don't like it now, as this is even rarer.
In both cases, it is not Drupal which is at fault.
------------------------------------------------------------------------
January 17, 2005 - 22:48 : kbahey
Steven
While I agree with most of what you said, the 404s show up in the logs
enough to be a bother.
Perhaps the original design of Drupal did not forsee that people will
use url aliases to mimic directory/file hierarchies. Whether this was
intended or not, it is the way many use Drupal today.
It does not matter where the bug is (Drupal or the external world), as
long as we can stop it ourselves, by adjusting our end of it.
The fix is simple enough and does not break standards (if implemented
as described with a leading / before the css).
------------------------------------------------------------------------
January 17, 2005 - 22:54 : Steven
It does not break standards, but it does bloat the code in an ugly way.
Why not send an e-mail to the owners of the crawlers and tell them to
implement a standard that is nearly 10 years old [3] (RFC 1808)?
Note that Google Cache now seems to correctly interpret base URLs [4]
and even adds a <base> tag of its own.
By the way, this problem has nothing to do with people using URL
aliases or not, as for a browser the regular nested paths that Drupal
uses (e.g. "node/1" is no different from aliases mimicking files
"foo/bar.html").
[3] http://www.faqs.org/rfcs/rfc1808.html
[4]
http://www.google.be/search?q=cache%3Awww.drupal.org&sourceid=mozilla-searc…
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: Steven
Status: patch
It does not break standards, but it does bloat the code in an ugly way.
Why not send an e-mail to the owners of the crawlers and tell them to
implement a standard that is nearly 10 years old [1] (RFC 1808)?
Note that Google Cache now seems to correctly interpret base URLs [2]
and even adds a <base> tag of its own.
By the way, this problem has nothing to do with people using URL
aliases or not, as for a browser the regular nested paths that Drupal
uses (e.g. "node/1" is no different from aliases mimicking files
"foo/bar.html").
[1] http://www.faqs.org/rfcs/rfc1808.html
[2]
http://www.google.be/search?q=cache%3Awww.drupal.org&sourceid=mozilla-searc…
Steven
Previous comments:
------------------------------------------------------------------------
November 19, 2004 - 04:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 20, 2004 - 05:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 20, 2004 - 05:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 10:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 11:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 21:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 17:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 17:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 18:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 22, 2004 - 01:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
[?php $base = $parts['path'] . '/' ; ?]
Replace that by:
[?php $base = ( $path == '/' ? $base : $parts['path'] . '/' ); ?]
------------------------------------------------------------------------
November 28, 2004 - 05:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 10:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 19:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 20:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 29, 2004 - 00:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 21:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 22:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 23:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 16:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
------------------------------------------------------------------------
January 17, 2005 - 14:00 : chx
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
------------------------------------------------------------------------
January 17, 2005 - 14:09 : Goba
I don't think that removing from the themes is a good idea, using
$parts['path'] should be encouraged though before the files, which
would fix the google cache problem, and would still keep the HTML size
low. It would also help those, who save the file to find the
originating site easier, since clicking on a non-pagelocal link would
lead to the online version.
------------------------------------------------------------------------
January 17, 2005 - 17:03 : Steven
Definitely -1 on removing the <base> tag or using absolute or
root-relative URLs. This tag has been around for ages, and it is the
only way to make clean URLs work without bloating in the code. FYI,
"base" is (first?) mentioned in Berners-Lee's HTML 1.0 draft [3].
That's June 1993.
As the amount of clean URL-using sites grows, the crawlers will have to
be updated. Perhaps we could prevent crawlers from going too insane by
404ing for URLs with more than say 10 components? That would prevent
the really crappy ones from hammering your site.
I'm all for making the <base> tag easier to handle for the user (say,
by including a filter to allow simple anchor links to work as most
people expect them to), but we should keep Drupal-generated URLs clean
and completely relative.
[3] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
------------------------------------------------------------------------
January 17, 2005 - 17:57 : kbahey
The problem with css is this: The @import argument does not start with a
/.
This is simple to fix.
We keep the "base" as it is today, but add the new variable: $base
before it.
So for a site where Drupal is installed in the DocumentRoot, all that
will change is that /misc/drupal.css and /themes/themename/style.css
will be preceded by a slash. For sites that use another path, that path
will be prepended to the css file name.
How about that?
------------------------------------------------------------------------
January 17, 2005 - 18:38 : Steven
What exactly is the problem with the @import? As far as I know:
- url() in stylesheets is interpreted relative to the base of the
stylesheet, not the source document.
- However, if the styles are inside an HTML document, through a style
tag or style attribute, then the stylesheet's location is the same as
the HTML document.
- Thus, the stylesheet's base is the same as the base of the HTML
document (which can be altered through the <base> tag).
I just don't see why it is necessary. As far as I know, the only
browser that has had problems resolving CSS urls properly was Netscape
4, which does not support @import at all, and which Drupal does not
support either, because of its CSS usage.
------------------------------------------------------------------------
January 17, 2005 - 19:35 : kbahey
The problem for stylesheets is as follows. I think it mainly affect
crawlers and Google's cache.
Say you have an installtion of Drupal in DocumentRoot. You then use url
aliases, and put slashes in them.
For example, you use news/general/2004-12-15.html for a node.
That node still has misc/drupal.css and themes/pushbutton/style.css in
the head section if the document. Crawlers get fooled by that and try
to look for /news/general/misc/drupal.css and
/news/general/themes/pushbutton/style.css, which don't exist.
So, just prepending the new $base variable (in chx's patch) before the
stylesheet @import argument would fix this issue. Assuming you are in
DocumentRoot, then /misc and /themes would be used instead of just misc
and themes.
It would still be compliant with standards, be relative to the web
site, and no ambiguous to anyone, be they crawler or browser.
I hope it is clearer now.
I think chx can change the patch to use the $base instead of $base_url
everywhere, so as to avoid the host/domain name in the urls.
------------------------------------------------------------------------
January 17, 2005 - 22:36 : Steven
But typical crawlers don't even pay attention to stylesheets, hence it
wouldn't have much use for them. I just don't see why we should adjust
to rare cases of buggy software. Reading out a base URL from an HTML
document is dead easy, and on top of that it doesn't add more
complexity as without the base tag, the document's URL is already an
implicit base which has to be parsed anyway.
I did not like it when we altered the <link> tag to accomodate buggy
RSS readers and I certainly don't like it now, as this is even rarer.
In both cases, it is not Drupal which is at fault.
------------------------------------------------------------------------
January 17, 2005 - 22:48 : kbahey
Steven
While I agree with most of what you said, the 404s show up in the logs
enough to be a bother.
Perhaps the original design of Drupal did not forsee that people will
use url aliases to mimic directory/file hierarchies. Whether this was
intended or not, it is the way many use Drupal today.
It does not matter where the bug is (Drupal or the external world), as
long as we can stop it ourselves, by adjusting our end of it.
The fix is simple enough and does not break standards (if implemented
as described with a leading / before the css).
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: kbahey
Status: patch
Steven
While I agree with most of what you said, the 404s show up in the logs
enough to be a bother.
Perhaps the original design of Drupal did not forsee that people will
use url aliases to mimic directory/file hierarchies. Whether this was
intended or not, it is the way many use Drupal today.
It does not matter where the bug is (Drupal or the external world), as
long as we can stop it ourselves, by adjusting our end of it.
The fix is simple enough and does not break standards (if implemented
as described with a leading / before the css).
kbahey
Previous comments:
------------------------------------------------------------------------
November 18, 2004 - 22:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 19, 2004 - 23:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 19, 2004 - 23:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 04:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 05:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 15:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 11:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 11:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 12:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 21, 2004 - 19:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
[?php $base = $parts['path'] . '/' ; ?]
Replace that by:
[?php $base = ( $path == '/' ? $base : $parts['path'] . '/' ); ?]
------------------------------------------------------------------------
November 27, 2004 - 23:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 04:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 13:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 14:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 28, 2004 - 18:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 15:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 16:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 17:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 10:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
------------------------------------------------------------------------
January 17, 2005 - 08:00 : chx
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
------------------------------------------------------------------------
January 17, 2005 - 08:09 : Goba
I don't think that removing from the themes is a good idea, using
$parts['path'] should be encouraged though before the files, which
would fix the google cache problem, and would still keep the HTML size
low. It would also help those, who save the file to find the
originating site easier, since clicking on a non-pagelocal link would
lead to the online version.
------------------------------------------------------------------------
January 17, 2005 - 11:03 : Steven
Definitely -1 on removing the <base> tag or using absolute or
root-relative URLs. This tag has been around for ages, and it is the
only way to make clean URLs work without bloating in the code. FYI,
"base" is (first?) mentioned in Berners-Lee's HTML 1.0 draft [1].
That's June 1993.
As the amount of clean URL-using sites grows, the crawlers will have to
be updated. Perhaps we could prevent crawlers from going too insane by
404ing for URLs with more than say 10 components? That would prevent
the really crappy ones from hammering your site.
I'm all for making the <base> tag easier to handle for the user (say,
by including a filter to allow simple anchor links to work as most
people expect them to), but we should keep Drupal-generated URLs clean
and completely relative.
[1] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
------------------------------------------------------------------------
January 17, 2005 - 11:57 : kbahey
The problem with css is this: The @import argument does not start with a
/.
This is simple to fix.
We keep the "base" as it is today, but add the new variable: $base
before it.
So for a site where Drupal is installed in the DocumentRoot, all that
will change is that /misc/drupal.css and /themes/themename/style.css
will be preceded by a slash. For sites that use another path, that path
will be prepended to the css file name.
How about that?
------------------------------------------------------------------------
January 17, 2005 - 12:38 : Steven
What exactly is the problem with the @import? As far as I know:
- url() in stylesheets is interpreted relative to the base of the
stylesheet, not the source document.
- However, if the styles are inside an HTML document, through a style
tag or style attribute, then the stylesheet's location is the same as
the HTML document.
- Thus, the stylesheet's base is the same as the base of the HTML
document (which can be altered through the <base> tag).
I just don't see why it is necessary. As far as I know, the only
browser that has had problems resolving CSS urls properly was Netscape
4, which does not support @import at all, and which Drupal does not
support either, because of its CSS usage.
------------------------------------------------------------------------
January 17, 2005 - 13:35 : kbahey
The problem for stylesheets is as follows. I think it mainly affect
crawlers and Google's cache.
Say you have an installtion of Drupal in DocumentRoot. You then use url
aliases, and put slashes in them.
For example, you use news/general/2004-12-15.html for a node.
That node still has misc/drupal.css and themes/pushbutton/style.css in
the head section if the document. Crawlers get fooled by that and try
to look for /news/general/misc/drupal.css and
/news/general/themes/pushbutton/style.css, which don't exist.
So, just prepending the new $base variable (in chx's patch) before the
stylesheet @import argument would fix this issue. Assuming you are in
DocumentRoot, then /misc and /themes would be used instead of just misc
and themes.
It would still be compliant with standards, be relative to the web
site, and no ambiguous to anyone, be they crawler or browser.
I hope it is clearer now.
I think chx can change the patch to use the $base instead of $base_url
everywhere, so as to avoid the host/domain name in the urls.
------------------------------------------------------------------------
January 17, 2005 - 16:36 : Steven
But typical crawlers don't even pay attention to stylesheets, hence it
wouldn't have much use for them. I just don't see why we should adjust
to rare cases of buggy software. Reading out a base URL from an HTML
document is dead easy, and on top of that it doesn't add more
complexity as without the base tag, the document's URL is already an
implicit base which has to be parsed anyway.
I did not like it when we altered the <link> tag to accomodate buggy
RSS readers and I certainly don't like it now, as this is even rarer.
In both cases, it is not Drupal which is at fault.
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: Steven
Status: patch
But typical crawlers don't even pay attention to stylesheets, hence it
wouldn't have much use for them. I just don't see why we should adjust
to rare cases of buggy software. Reading out a base URL from an HTML
document is dead easy, and on top of that it doesn't add more
complexity as without the base tag, the document's URL is already an
implicit base which has to be parsed anyway.
I did not like it when we altered the <link> tag to accomodate buggy
RSS readers and I certainly don't like it now, as this is even rarer.
In both cases, it is not Drupal which is at fault.
Steven
Previous comments:
------------------------------------------------------------------------
November 19, 2004 - 04:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 20, 2004 - 05:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 20, 2004 - 05:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 10:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 11:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 21:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 17:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 17:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 18:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 22, 2004 - 01:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
[?php $base = $parts['path'] . '/' ; ?]
Replace that by:
[?php $base = ( $path == '/' ? $base : $parts['path'] . '/' ); ?]
------------------------------------------------------------------------
November 28, 2004 - 05:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 10:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 19:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 20:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 29, 2004 - 00:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 21:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 22:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 23:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 16:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
------------------------------------------------------------------------
January 17, 2005 - 14:00 : chx
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
------------------------------------------------------------------------
January 17, 2005 - 14:09 : Goba
I don't think that removing from the themes is a good idea, using
$parts['path'] should be encouraged though before the files, which
would fix the google cache problem, and would still keep the HTML size
low. It would also help those, who save the file to find the
originating site easier, since clicking on a non-pagelocal link would
lead to the online version.
------------------------------------------------------------------------
January 17, 2005 - 17:03 : Steven
Definitely -1 on removing the <base> tag or using absolute or
root-relative URLs. This tag has been around for ages, and it is the
only way to make clean URLs work without bloating in the code. FYI,
"base" is (first?) mentioned in Berners-Lee's HTML 1.0 draft [1].
That's June 1993.
As the amount of clean URL-using sites grows, the crawlers will have to
be updated. Perhaps we could prevent crawlers from going too insane by
404ing for URLs with more than say 10 components? That would prevent
the really crappy ones from hammering your site.
I'm all for making the <base> tag easier to handle for the user (say,
by including a filter to allow simple anchor links to work as most
people expect them to), but we should keep Drupal-generated URLs clean
and completely relative.
[1] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
------------------------------------------------------------------------
January 17, 2005 - 17:57 : kbahey
The problem with css is this: The @import argument does not start with a
/.
This is simple to fix.
We keep the "base" as it is today, but add the new variable: $base
before it.
So for a site where Drupal is installed in the DocumentRoot, all that
will change is that /misc/drupal.css and /themes/themename/style.css
will be preceded by a slash. For sites that use another path, that path
will be prepended to the css file name.
How about that?
------------------------------------------------------------------------
January 17, 2005 - 18:38 : Steven
What exactly is the problem with the @import? As far as I know:
- url() in stylesheets is interpreted relative to the base of the
stylesheet, not the source document.
- However, if the styles are inside an HTML document, through a style
tag or style attribute, then the stylesheet's location is the same as
the HTML document.
- Thus, the stylesheet's base is the same as the base of the HTML
document (which can be altered through the <base> tag).
I just don't see why it is necessary. As far as I know, the only
browser that has had problems resolving CSS urls properly was Netscape
4, which does not support @import at all, and which Drupal does not
support either, because of its CSS usage.
------------------------------------------------------------------------
January 17, 2005 - 19:35 : kbahey
The problem for stylesheets is as follows. I think it mainly affect
crawlers and Google's cache.
Say you have an installtion of Drupal in DocumentRoot. You then use url
aliases, and put slashes in them.
For example, you use news/general/2004-12-15.html for a node.
That node still has misc/drupal.css and themes/pushbutton/style.css in
the head section if the document. Crawlers get fooled by that and try
to look for /news/general/misc/drupal.css and
/news/general/themes/pushbutton/style.css, which don't exist.
So, just prepending the new $base variable (in chx's patch) before the
stylesheet @import argument would fix this issue. Assuming you are in
DocumentRoot, then /misc and /themes would be used instead of just misc
and themes.
It would still be compliant with standards, be relative to the web
site, and no ambiguous to anyone, be they crawler or browser.
I hope it is clearer now.
I think chx can change the patch to use the $base instead of $base_url
everywhere, so as to avoid the host/domain name in the urls.
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: kbahey
Status: patch
The problem for stylesheets is as follows. I think it mainly affect
crawlers and Google's cache.
Say you have an installtion of Drupal in DocumentRoot. You then use url
aliases, and put slashes in them.
For example, you use news/general/2004-12-15.html for a node.
That node still has misc/drupal.css and themes/pushbutton/style.css in
the head section if the document. Crawlers get fooled by that and try
to look for /news/general/misc/drupal.css and
/news/general/themes/pushbutton/style.css, which don't exist.
So, just prepending the new $base variable (in chx's patch) before the
stylesheet @import argument would fix this issue. Assuming you are in
DocumentRoot, then /misc and /themes would be used instead of just misc
and themes.
It would still be compliant with standards, be relative to the web
site, and no ambiguous to anyone, be they crawler or browser.
I hope it is clearer now.
I think chx can change the patch to use the $base instead of $base_url
everywhere, so as to avoid the host/domain name in the urls.
kbahey
Previous comments:
------------------------------------------------------------------------
November 18, 2004 - 22:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 19, 2004 - 23:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 19, 2004 - 23:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 04:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 05:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 15:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 11:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 11:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 12:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 21, 2004 - 19:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
[?php $base = $parts['path'] . '/' ; ?]
Replace that by:
[?php $base = ( $path == '/' ? $base : $parts['path'] . '/' ); ?]
------------------------------------------------------------------------
November 27, 2004 - 23:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 04:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 13:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 14:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 28, 2004 - 18:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 15:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 16:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 17:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 10:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
------------------------------------------------------------------------
January 17, 2005 - 08:00 : chx
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
------------------------------------------------------------------------
January 17, 2005 - 08:09 : Goba
I don't think that removing from the themes is a good idea, using
$parts['path'] should be encouraged though before the files, which
would fix the google cache problem, and would still keep the HTML size
low. It would also help those, who save the file to find the
originating site easier, since clicking on a non-pagelocal link would
lead to the online version.
------------------------------------------------------------------------
January 17, 2005 - 11:03 : Steven
Definitely -1 on removing the <base> tag or using absolute or
root-relative URLs. This tag has been around for ages, and it is the
only way to make clean URLs work without bloating in the code. FYI,
"base" is (first?) mentioned in Berners-Lee's HTML 1.0 draft [1].
That's June 1993.
As the amount of clean URL-using sites grows, the crawlers will have to
be updated. Perhaps we could prevent crawlers from going too insane by
404ing for URLs with more than say 10 components? That would prevent
the really crappy ones from hammering your site.
I'm all for making the <base> tag easier to handle for the user (say,
by including a filter to allow simple anchor links to work as most
people expect them to), but we should keep Drupal-generated URLs clean
and completely relative.
[1] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
------------------------------------------------------------------------
January 17, 2005 - 11:57 : kbahey
The problem with css is this: The @import argument does not start with a
/.
This is simple to fix.
We keep the "base" as it is today, but add the new variable: $base
before it.
So for a site where Drupal is installed in the DocumentRoot, all that
will change is that /misc/drupal.css and /themes/themename/style.css
will be preceded by a slash. For sites that use another path, that path
will be prepended to the css file name.
How about that?
------------------------------------------------------------------------
January 17, 2005 - 12:38 : Steven
What exactly is the problem with the @import? As far as I know:
- url() in stylesheets is interpreted relative to the base of the
stylesheet, not the source document.
- However, if the styles are inside an HTML document, through a style
tag or style attribute, then the stylesheet's location is the same as
the HTML document.
- Thus, the stylesheet's base is the same as the base of the HTML
document (which can be altered through the <base> tag).
I just don't see why it is necessary. As far as I know, the only
browser that has had problems resolving CSS urls properly was Netscape
4, which does not support @import at all, and which Drupal does not
support either, because of its CSS usage.
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
I've just launched a new drupal site at http://www.bikeweb.com using 4.5
As part of this I wanted to include my minimalist html editor. It's all
worked but I had to hack common.inc slightly. So here's a couple of
suggestions for making extending form_textarea() easier.
1. Textarea, pre really needs to be between the title and textarea of
the form_item. textarea extensions are very likely to add a toolbar and
this really ought to be hard against the top of the textarea rather than
floating above the title.
2. I need a way of adding extra $attributes as part of the module code
and can't see an easy way of doing this without making $attributes
global. Suggestions?
--
Julian Bond Email&MSM: julian.bond at voidstar.com
Webmaster: http://www.ecademy.com/
Personal WebLog: http://www.voidstar.com/
M: +44 (0)77 5907 2173 T: +44 (0)192 0412 433
S: callto://julian.bond/
4
5