Development
Threads by month
- ----- 2026 -----
- June
- May
- April
- March
- February
- January
- ----- 2025 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
January 2005
- 54 participants
- 341 discussions
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: Steven
Status: patch
What exactly is the problem with the @import? As far as I know:
- url() in stylesheets is interpreted relative to the base of the
stylesheet, not the source document.
- However, if the styles are inside an HTML document, through a style
tag or style attribute, then the stylesheet's location is the same as
the HTML document.
- Thus, the stylesheet's base is the same as the base of the HTML
document (which can be altered through the <base> tag).
I just don't see why it is necessary. As far as I know, the only
browser that has had problems resolving CSS urls properly was Netscape
4, which does not support @import at all, and which Drupal does not
support either, because of its CSS usage.
Steven
Previous comments:
------------------------------------------------------------------------
November 19, 2004 - 04:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 20, 2004 - 05:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 20, 2004 - 05:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 10:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 11:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 21:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 17:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 17:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 18:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 22, 2004 - 01:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
[?php $base = $parts['path'] . '/' ; ?]
Replace that by:
[?php $base = ( $path == '/' ? $base : $parts['path'] . '/' ); ?]
------------------------------------------------------------------------
November 28, 2004 - 05:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 10:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 19:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 20:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 29, 2004 - 00:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 21:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 22:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 23:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 16:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
------------------------------------------------------------------------
January 17, 2005 - 14:00 : chx
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
------------------------------------------------------------------------
January 17, 2005 - 14:09 : Goba
I don't think that removing from the themes is a good idea, using
$parts['path'] should be encouraged though before the files, which
would fix the google cache problem, and would still keep the HTML size
low. It would also help those, who save the file to find the
originating site easier, since clicking on a non-pagelocal link would
lead to the online version.
------------------------------------------------------------------------
January 17, 2005 - 17:03 : Steven
Definitely -1 on removing the <base> tag or using absolute or
root-relative URLs. This tag has been around for ages, and it is the
only way to make clean URLs work without bloating in the code. FYI,
"base" is (first?) mentioned in Berners-Lee's HTML 1.0 draft [1].
That's June 1993.
As the amount of clean URL-using sites grows, the crawlers will have to
be updated. Perhaps we could prevent crawlers from going too insane by
404ing for URLs with more than say 10 components? That would prevent
the really crappy ones from hammering your site.
I'm all for making the <base> tag easier to handle for the user (say,
by including a filter to allow simple anchor links to work as most
people expect them to), but we should keep Drupal-generated URLs clean
and completely relative.
[1] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
------------------------------------------------------------------------
January 17, 2005 - 17:57 : kbahey
The problem with css is this: The @import argument does not start with a
/.
This is simple to fix.
We keep the "base" as it is today, but add the new variable: $base
before it.
So for a site where Drupal is installed in the DocumentRoot, all that
will change is that /misc/drupal.css and /themes/themename/style.css
will be preceded by a slash. For sites that use another path, that path
will be prepended to the css file name.
How about that?
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
Hi there!
I just want to let you know that I have the eaccelerator php cache at
drupaldevs.org.
You can enable it for your site by setting
php_flag eaccelerator.enable 1
and
php_flag eaccelerator.optimizer 1
in your .htaccess file.
I did some tests and found my test setup to have a reduced page execution
time that went from about 240ms to 130~160ms. Memory usage was also
reduced.
I am not sure if the memory consumption numbers are accurate. They were
retrieved from xdebug and that might interfere with the eaccelerator
extension.
Cheers,
Gerhard
1
0
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: kbahey
Status: patch
The problem with css is this: The @import argument does not start with a
/.
This is simple to fix.
We keep the "base" as it is today, but add the new variable: $base
before it.
So for a site where Drupal is installed in the DocumentRoot, all that
will change is that /misc/drupal.css and /themes/themename/style.css
will be preceded by a slash. For sites that use another path, that path
will be prepended to the css file name.
How about that?
kbahey
Previous comments:
------------------------------------------------------------------------
November 18, 2004 - 22:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 19, 2004 - 23:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 19, 2004 - 23:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 04:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 05:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 15:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 11:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 11:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 12:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 21, 2004 - 19:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
[?php $base = $parts['path'] . '/' ; ?]
Replace that by:
[?php $base = ( $path == '/' ? $base : $parts['path'] . '/' ); ?]
------------------------------------------------------------------------
November 27, 2004 - 23:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 04:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 13:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 14:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 28, 2004 - 18:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 15:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 16:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 17:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 10:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
------------------------------------------------------------------------
January 17, 2005 - 08:00 : chx
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
------------------------------------------------------------------------
January 17, 2005 - 08:09 : Goba
I don't think that removing from the themes is a good idea, using
$parts['path'] should be encouraged though before the files, which
would fix the google cache problem, and would still keep the HTML size
low. It would also help those, who save the file to find the
originating site easier, since clicking on a non-pagelocal link would
lead to the online version.
------------------------------------------------------------------------
January 17, 2005 - 11:03 : Steven
Definitely -1 on removing the <base> tag or using absolute or
root-relative URLs. This tag has been around for ages, and it is the
only way to make clean URLs work without bloating in the code. FYI,
"base" is (first?) mentioned in Berners-Lee's HTML 1.0 draft [1].
That's June 1993.
As the amount of clean URL-using sites grows, the crawlers will have to
be updated. Perhaps we could prevent crawlers from going too insane by
404ing for URLs with more than say 10 components? That would prevent
the really crappy ones from hammering your site.
I'm all for making the <base> tag easier to handle for the user (say,
by including a filter to allow simple anchor links to work as most
people expect them to), but we should keep Drupal-generated URLs clean
and completely relative.
[1] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
File: /themes/manji/page.tpl.php 1.3
Date: January 17, 2005 - 14:07
User: chrisada
updated credit in footer
----------------------------------------------------------------------
File: /modules/contextlinks/CHANGELOG.txt 1.3
/modules/contextlinks/contextlinks.module 1.6
Date: January 17, 2005 - 14:06
User: jhriggs
merge from 4.5
----------------------------------------------------------------------
File: /modules/contextlinks/CHANGELOG.txt 1.2.2.2 DRUPAL-4-5
/modules/contextlinks/contextlinks.module 1.5.2.1 DRUPAL-4-5
Date: January 17, 2005 - 14:02
User: jhriggs
2005-01-17
- release 4.5r3/2005011700 (updated versioning to be more
meaningful)
- contextlinks.module:
- moved script import from _init() to _menu()
- fixed installation/removal bug where menus would not properly
update (fixes bug 15699 reported by sammoscheck)
----------------------------------------------------------------------
File: /modules/remindme/INSTALL.txt 1.1
Date: January 17, 2005 - 13:45
User: killes
Install instrecutions.
----------------------------------------------------------------------
File: /modules/contextlinks/CHANGELOG NONE DRUPAL-4-5
/modules/contextlinks/CREDITS NONE DRUPAL-4-5
/modules/contextlinks/INSTALL NONE DRUPAL-4-5
/modules/contextlinks/LICENSE NONE DRUPAL-4-5
/modules/contextlinks/README NONE DRUPAL-4-5
/modules/contextlinks/TODO NONE DRUPAL-4-5
Date: January 17, 2005 - 13:37
User: jhriggs
removed pre .txt remnants on 4.5 branch
----------------------------------------------------------------------
File: /modules/im/im.module 1.4
/modules/im/README.txt 1.2
Date: January 17, 2005 - 13:08
User: killes
Bugfixes, http://drupal.org/node/15301 http://drupal.org/node/15631 http://drupal.org/node/15521
----------------------------------------------------------------------
File: /translations/hu/aggregator-module.po 1.1.2.6 DRUPAL-4-5
/translations/hu/archive-module.po 1.1.2.4 DRUPAL-4-5
/translations/hu/block-module.po 1.1.2.4 DRUPAL-4-5
/translations/hu/blog-module.po 1.1.2.4 DRUPAL-4-5
/translations/hu/blogapi-module.po 1.1.2.4 DRUPAL-4-5
/translations/hu/book-module.po 1.1.2.4 DRUPAL-4-5
/translations/hu/comment-module.po 1.2.2.6 DRUPAL-4-5
/translations/hu/common-inc.po 1.2.2.3 DRUPAL-4-5
/translations/hu/drupal-module.po 1.1.2.4 DRUPAL-4-5
/translations/hu/file-inc.po 1.1.2.4 DRUPAL-4-5
/translations/hu/filter-module.po 1.1.2.5 DRUPAL-4-5
/translations/hu/forum-module.po 1.1.2.5 DRUPAL-4-5
/translations/hu/general.po 1.1.2.4 DRUPAL-4-5
/translations/hu/locale-inc.po 1.1.2.4 DRUPAL-4-5
/translations/hu/locale-module.po 1.1.2.4 DRUPAL-4-5
/translations/hu/menu-module.po 1.1.2.3 DRUPAL-4-5
/translations/hu/node-module.po 1.2.2.7 DRUPAL-4-5
/translations/hu/path-module.po 1.1.2.6 DRUPAL-4-5
/translations/hu/ping-module.po 1.1.2.5 DRUPAL-4-5
/translations/hu/poll-module.po 1.2.2.6 DRUPAL-4-5
/translations/hu/profile-module.po 1.1.2.5 DRUPAL-4-5
/translations/hu/queue-module.po 1.1.2.4 DRUPAL-4-5
/translations/hu/search-module.po 1.1.2.4 DRUPAL-4-5
/translations/hu/statistics-module.po 1.1.2.6 DRUPAL-4-5
/translations/hu/story-module.po 1.1.2.5 DRUPAL-4-5
/translations/hu/system-module.po 1.2.2.4 DRUPAL-4-5
/translations/hu/taxonomy-module.po 1.1.2.7 DRUPAL-4-5
/translations/hu/throttle-module.po 1.1.2.6 DRUPAL-4-5
/translations/hu/upload-module.po 1.1.2.6 DRUPAL-4-5
/translations/hu/user-module.po 1.2.2.6 DRUPAL-4-5
/translations/hu/watchdog-module.po 1.1.2.3 DRUPAL-4-5
Date: January 17, 2005 - 12:51
User: goba
updated translations for 4.5.2
----------------------------------------------------------------------
File: /modules/paypal_framework/README.txt NONE
Date: January 17, 2005 - 11:02
User: crackerjm
----------------------------------------------------------------------
File: /modules/paypal_framework/paypal_framework.module NONE
/modules/paypal_framework/paypal_framework.mysql NONE
/modules/paypal_framework/TODO NONE
Date: January 17, 2005 - 11:02
User: crackerjm
----------------------------------------------------------------------
File: /sandbox/dikini/binder/binder.html 1.1
/sandbox/dikini/binder/binder.module 1.4
/sandbox/dikini/binder/fields.png 1.1
/sandbox/dikini/binder/g6685.png 1.1
/sandbox/dikini/binder/rigid-vs-fluid.html 1.1
/sandbox/dikini/binder/state_machines_and_metadata.html 1.3
Date: January 17, 2005 - 09:27
User: dikini
Updated the a few of the methods, add still not finished.
Added some documentation
----------------------------------------------------------------------
File: /sandbox/junyor/import/blogger.inc 1.3
/sandbox/junyor/import/CHANGELOG 1.2
/sandbox/junyor/import/import.php 1.2
/sandbox/junyor/import/README 1.2
/sandbox/junyor/import/TODO 1.2
Date: January 17, 2005 - 08:32
User: junyor
Some bug fixes. The Blogger importer is more or less usable now, though comment import is untested.
----------------------------------------------------------------------
File: /modules/mail_archive/CHANGELOG.txt 1.7
/modules/mail_archive/mail_archive.module 1.8
/modules/mail_archive/mail_archive.mysql 1.5
Date: January 17, 2005 - 02:20
User: jeremy
- mail_archive.module:
o don't increment messages_downloaded if download fails (or is duplicate)
o set maximum number of downloads at a time (ie, to prevent browser
timeout)
o added some additional mail_archive_api hooks to support upcoming addon
modules
o pass $message to all invoked hooks
- mail_archive.mysql:
o fix typo in mail_archive_subscriptions preventing table creation
o remove unused fields from mail_archive_subscriptions
----------------------------------------------------------------------
File: /includes/common.inc 1.418
/misc/drupal.css 1.90
/modules/node.module 1.441
/modules/taxonomy.module 1.165
/sites/default/settings.php 1.3
Date: January 17, 2005 - 00:41
User: unconed
- Various code style fixes
----------------------------------------------------------------------
File: /modules/project/mail.inc 1.37
Date: January 16, 2005 - 23:58
User: unconed
- E-mail issues: filter content before converting to plaintext
----------------------------------------------------------------------
File: /modules/news_page/news_page.module 1.2
Date: January 16, 2005 - 23:34
User: MegaGrunt
Closed div around feed-item.
----------------------------------------------------------------------
File: /update.php 1.145
Date: January 16, 2005 - 23:16
User: unconed
- Adding a cache wipe to update.php, as this will avoid stale cache bugs after an update
(e.g. admin.module was removed -> needs menu cache update).
----------------------------------------------------------------------
File: /modules/ecommerce/cart/cart.module 1.31
/modules/ecommerce/cart/CHANGELOG.txt 1.6
Date: January 16, 2005 - 22:41
User: mathias
* Bugfix: The cart/view page should never be cached.
----------------------------------------------------------------------
File: /modules/ecommerce/cart/cart.module 1.25.2.6 DRUPAL-4-5
/modules/ecommerce/cart/CHANGELOG.txt 1.1.2.5 DRUPAL-4-5
Date: January 16, 2005 - 22:41
User: mathias
* Bugfix: The cart/view page should never be cached.
----------------------------------------------------------------------
File: /modules/inline/po/it.po 1.1
Date: January 16, 2005 - 21:44
User: matteo
----------------------------------------------------------------------
File: /modules/inline/po/inline-module.po 1.1
Date: January 16, 2005 - 21:44
User: matteo
----------------------------------------------------------------------
File: /includes/module.inc 1.65
/modules/archive.module 1.74
/modules/blog.module 1.209
/modules/book.module 1.278
/modules/comment.module 1.325
/modules/forum.module 1.220
/modules/node.module 1.440
/modules/poll.module 1.153
/modules/queue.module 1.123
/modules/taxonomy.module 1.164
/modules/tracker.module 1.105
/sandbox/matteo/translations/Drupal/block-module.po 1.4
/sandbox/matteo/translations/Drupal/filter-module.mo 1.1
/sandbox/matteo/translations/Drupal/filter-module.po 1.4
/sandbox/matteo/translations/Drupal/throttle-module.mo 1.1
/sandbox/matteo/translations/Drupal/throttle-module.po 1.3
Date: January 16, 2005 - 18:44
User: dries
- Patch #14731 by chx: made it possible to rewrite node queries.
----------------------------------------------------------------------
File: /modules/weather/po/nl.po 1.1
/modules/weather/po/weather-module.po NONE
Date: January 16, 2005 - 15:28
User: heeckhau
i used the wrong name for the dutch translation
----------------------------------------------------------------------
File: /modules/weather/po/weather-module.po 1.1
Date: January 16, 2005 - 15:24
User: heeckhau
dutch translation
----------------------------------------------------------------------
1
0
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: Steven
Status: patch
Definitely -1 on removing the <base> tag or using absolute or
root-relative URLs. This tag has been around for ages, and it is the
only way to make clean URLs work without bloating in the code. FYI,
"base" is (first?) mentioned in Berners-Lee's HTML 1.0 draft [1].
That's June 1993.
As the amount of clean URL-using sites grows, the crawlers will have to
be updated. Perhaps we could prevent crawlers from going too insane by
404ing for URLs with more than say 10 components? That would prevent
the really crappy ones from hammering your site.
I'm all for making the <base> tag easier to handle for the user (say,
by including a filter to allow simple anchor links to work as most
people expect them to), but we should keep Drupal-generated URLs clean
and completely relative.
[1] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
Steven
Previous comments:
------------------------------------------------------------------------
November 19, 2004 - 04:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 20, 2004 - 05:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 20, 2004 - 05:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 10:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 11:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 21:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 17:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 17:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 18:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 22, 2004 - 01:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
[?php $base = $parts['path'] . '/' ; ?]
Replace that by:
[?php $base = ( $path == '/' ? $base : $parts['path'] . '/' ); ?]
------------------------------------------------------------------------
November 28, 2004 - 05:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 10:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 19:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 20:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 29, 2004 - 00:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 21:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 22:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 23:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 16:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
------------------------------------------------------------------------
January 17, 2005 - 14:00 : chx
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
------------------------------------------------------------------------
January 17, 2005 - 14:09 : Goba
I don't think that removing from the themes is a good idea, using
$parts['path'] should be encouraged though before the files, which
would fix the google cache problem, and would still keep the HTML size
low. It would also help those, who save the file to find the
originating site easier, since clicking on a non-pagelocal link would
lead to the online version.
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: Goba
Status: patch
I don't think that removing from the themes is a good idea, using
$parts['path'] should be encouraged though before the files, which
would fix the google cache problem, and would still keep the HTML size
low. It would also help those, who save the file to find the
originating site easier, since clicking on a non-pagelocal link would
lead to the online version.
Goba
Previous comments:
------------------------------------------------------------------------
November 19, 2004 - 04:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 20, 2004 - 05:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 20, 2004 - 05:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 10:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 11:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 21:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 17:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 17:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 18:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 22, 2004 - 01:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
[?php $base = $parts['path'] . '/' ; ?]
Replace that by:
[?php $base = ( $path == '/' ? $base : $parts['path'] . '/' ); ?]
------------------------------------------------------------------------
November 28, 2004 - 05:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 10:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 19:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 20:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 29, 2004 - 00:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 21:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 22:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 23:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 16:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
------------------------------------------------------------------------
January 17, 2005 - 14:00 : chx
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: chx
Status: patch
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
chx
Previous comments:
------------------------------------------------------------------------
November 19, 2004 - 04:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 20, 2004 - 05:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 20, 2004 - 05:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 10:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 11:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 21:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 17:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 17:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 18:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 22, 2004 - 01:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
[?php $base = $parts['path'] . '/' ; ?]
Replace that by:
[?php $base = ( $path == '/' ? $base : $parts['path'] . '/' ); ?]
------------------------------------------------------------------------
November 28, 2004 - 05:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 10:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 19:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 20:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 29, 2004 - 00:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 21:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 22:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 23:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 16:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
Hi,
What is the reason for node.module is not a required one? What can Drupal
be used without nodes?
Regards
NK
--
"In God we trust; All others must provide an X.509 certificate".
2
3
The Drupal project has released version 4.5.2 of its open-source content
management platform. Drupal 4.5.2 is a maintenance release that provides
corrections of problems reported using the bug tracking system. Drupal
4.5.2 also fixes a cross-site scripting (XSS) vulnerability so it is
recommended that you upgrade your existing Drupal sites.
For more information about the Drupal 4.5.2 release, consult the
announcement at:
http://drupal.org/drupal-4.5.2
--
Dries Buytaert :: http://www.buytaert.net/
5
6
Project: Drupal
-Version: 4.5.1
+Version: 4.5.2
Component: forum.module
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: michaelemeyers
Updated by: Zed Pobre
-Status: active
+Status: patch
Attachment: http://drupal.org/files/issues/forum_3.patch (3.17 KB)
I'll chime in noting that it works for me as well. I've attached the
above solution as an actual patch file to 4.5.2.
Zed Pobre
Previous comments:
------------------------------------------------------------------------
December 14, 2004 - 16:32 : michaelemeyers
postgresql 7.4.5 / drupal 4.5.1
forum module version information:
// $Id: forum.module,v 1.217 2004/11/24 22:56:21 dries Exp $
in the forum module, the IF statement in the SQL queries is not
postgresql compliant:
IF(l.last_comment_uid, cu.name, l.last_comment_name) as
last_comment_name
needs to be a case statement (or an IF function in postgresql could be
written...):
CASE WHEN l.last_comment_uid = 1 THEN cu.name ELSE l.last_comment_name
END as last_comment_name
case statements are supported by both mySQL and postgresql...
--------------------------
IF results in errors:
--------------------------
warning: pg_query(): Query failed: ERROR: invalid input syntax for
type boolean: "13" in
/usr/local/apache/htdocs/includes/database.pgsql.inc on line 45.
user error:
query: SELECT DISTINCT(n.nid), l.last_comment_timestamp,
IF(l.last_comment_uid, cu.name, l.last_comment_name) as
last_comment_name, l.last_comment_uid FROM node n ,
node_comment_statistics l, users cu, term_node r WHERE n.nid = r.nid
AND r.tid = 354 AND n.status = 1 AND n.type = 'forum' AND
l.last_comment_uid = cu.uid AND n.nid = l.nid AND '1' ORDER BY
l.last_comment_timestamp DESC LIMIT 1 OFFSET 0 in
/usr/local/apache/htdocs/includes/database.pgsql.inc on line 62.
warning: pg_query(): Query failed: ERROR: invalid input syntax for
type boolean: "2" in
/usr/local/apache/htdocs/includes/database.pgsql.inc on line 45.
user error:
query: SELECT DISTINCT(n.nid), l.last_comment_timestamp,
IF(l.last_comment_uid, cu.name, l.last_comment_name) as
last_comment_name, l.last_comment_uid FROM node n ,
node_comment_statistics l, users cu, term_node r WHERE n.nid = r.nid
AND r.tid = 353 AND n.status = 1 AND n.type = 'forum' AND
l.last_comment_uid = cu.uid AND n.nid = l.nid AND '1' ORDER BY
l.last_comment_timestamp DESC LIMIT 1 OFFSET 0 in
/usr/local/apache/htdocs/includes/database.pgsql.inc on line 62.
--------------------------
suggested patch:
--------------------------
function forum_get_forums($tid = 0) { // ~at around line 380:
// remove:
-- $topic = db_fetch_object(db_query_range('SELECT DISTINCT(n.nid),
l.last_comment_timestamp, IF(l.last_comment_uid, cu.name,
l.last_comment_name) as last_comment_name, l.last_comment_uid FROM
{node} n ' . node_access_join_sql() . ", {node_comment_statistics} l
/*! USE INDEX (node_comment_timestamp) */, {users} cu, {term_node} r
WHERE n.nid = r.nid AND r.tid = %d AND n.status = 1 AND n.type =
'forum' AND l.last_comment_uid = cu.uid AND n.nid = l.nid AND " .
node_access_where_sql() . ' ORDER BY l.last_comment_timestamp DESC',
$forum->tid, 0, 1));
// add:
++ $topic = db_fetch_object(db_query_range('SELECT DISTINCT(n.nid),
l.last_comment_timestamp, CASE WHEN l.last_comment_uid = 1 THEN cu.name
ELSE l.last_comment_name END as last_comment_name, l.last_comment_uid
FROM {node} n ' . node_access_join_sql() . ", {node_comment_statistics}
l, {users} cu, {term_node} r WHERE n.nid = r.nid AND r.tid = %d AND
n.status = 1 AND n.type = 'forum' AND l.last_comment_uid = cu.uid AND
n.nid = l.nid AND " . node_access_where_sql() . ' ORDER BY
l.last_comment_timestamp DESC', $forum->tid, 0, 1));
function _forum_topics_read($term, $uid) { // ~at around line 420:
// remove:
-- $sql = "SELECT DISTINCT(n.nid), f.tid, n.title, n.sticky, u.name,
u.uid, n.created AS timestamp, n.comment AS comment_mode,
l.last_comment_timestamp, IF(l.last_comment_uid, cu.name,
l.last_comment_name) as last_comment_name, l.last_comment_uid,
l.comment_count AS num_comments FROM {node} n ". node_access_join_sql()
.", {node_comment_statistics} l, {users} cu, {term_node} r, {users} u,
{forum} f WHERE n.status = 1 AND l.last_comment_uid = cu.uid AND n.nid
= l.nid AND n.nid = r.nid AND r.tid = $check_tid AND n.uid = u.uid AND
n.nid = f.nid AND ". node_access_where_sql();
$sql .= tablesort_sql($forum_topic_list_header, 'n.sticky DESC,');
// add:
++ $sql = "SELECT DISTINCT(n.nid), f.tid, n.title, n.sticky, u.name,
u.uid, n.created AS timestamp, n.comment AS comment_mode,
l.last_comment_timestamp, CASE WHEN l.last_comment_uid = 1 THEN cu.name
ELSE l.last_comment_name END as last_comment_name, l.last_comment_uid,
l.comment_count AS num_comments FROM {node} n ". node_access_join_sql()
.", {node_comment_statistics} l, {users} cu, {term_node} r, {users} u,
{forum} f WHERE n.status = 1 AND l.last_comment_uid = cu.uid AND n.nid
= l.nid AND n.nid = r.nid AND r.tid = $check_tid AND n.uid = u.uid AND
n.nid = f.nid AND ". node_access_where_sql();
$sql .= tablesort_sql($forum_topic_list_header, 'n.sticky DESC,');
------------------------------------------------------------------------
January 10, 2005 - 16:34 : Anonymous
just wanted to chime in... this patch works perfectly.
thanks,
aaron
--
View: http://drupal.org/node/14352
Edit: http://drupal.org/project/comments/add/14352
1
0