Development
Threads by month
- ----- 2026 -----
- June
- May
- April
- March
- February
- January
- ----- 2025 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
February 2005
- 66 participants
- 347 discussions
On Wed, 02 Feb 2005 02:58:51 +0100, Steven Wittens <steven(a)acko.net> wrote:
> A bit late, but here's the rationale. People didn't look for profiles
> under users -> configure as it was buried to deep. It is hard to notice
> that something has popped up that deep when you've just enabled the
> module. Making it a higher level item is handier to use.
> Also, IMO setting up profiles does not have a direct relation to the
> user objects. It's the filling in/editing of a single user's profile
> itself that I would classify under "users" rather than setting up the
> fields. I think many admins will think the same.
Yes, I agree -- profile configuration was much too deep. I actually
find myself doing way too much "URL hacking" rather than using the
admin menus to navigate. A big problem of this is depth, and why (for
the admin menu only perhaps) we need to consider some sort of drop
down/fold out menu. Actually, I'd love to split the "Admin" tree into
a block separate from the Navigation block.
--
Boris Mann
http://www.bryght.com
1
0
Project: Drupal
Version: cvs
Component: node system
Category: tasks
Priority: normal
Assigned to: killes(a)www.drop.org
Reported by: killes(a)www.drop.org
Updated by: killes(a)www.drop.org
Status: patch
Attachment: http://drupal.org/files/issues/revisions_11.patch (48.95 KB)
The update issue still needs investigating. This patch is updated for
cvs.
killes(a)www.drop.org
Previous comments:
------------------------------------------------------------------------
May 5, 2004 - 17:25 : killes(a)www.drop.org
Currently all node revisions are stored in a serialized field in
node.table and retrieved for _each_ page view although they are rarely
needed. However, we have agreed that serializing data is bad and that
we should try to keep the memory foot print pf Drupal small.
Therefore I propose to create a separate revisions table which would be
in principle identical to the node table, only that it could have
several old copies of the same node. Extra data added by other modules
could be added in a serialized field unless we find a better solution.
------------------------------------------------------------------------
May 5, 2004 - 18:06 : jhriggs
I too think the serialized approach is less than desirable, but here's
an alternative. This would likely take some considerable rework in
core and contrib, but the following is how we handle similar types of
situations in our databases at work. It is more elegant that a
separate table, and avoids the (almost exact) duplication of a table.
Instead of separate tables, keep all revisions of nodes in the node
table as follows:
* add field: active (0/1 or Y/N)
* add field: revision
* every revision of a node is stored in the node table; however, only
one revision can be active at any given time
* nid can no longer be unique -- primary/unique key becomes (nid,
active)
* any time a node is loaded, updated (without revision), etc., the
active version is used.
Thoughts?
------------------------------------------------------------------------
May 5, 2004 - 18:57 : killes(a)www.drop.org
I am not opposed to your scheme, but I want to stress the following:
* Duplicating a table's structure is not bad (IMHO) as long as the
content is different.
* having two tables will allow us to have a rather small node table.
This is (maybe) a performance gain.
------------------------------------------------------------------------
May 5, 2004 - 19:37 : jhriggs
I don't necessarily think that duplicating a table's structure is _bad_.
It just seems to be wasteful and a pain to maintain. (Every change to
the node table is made twice...easy to do, but also easy to miss
perhaps.)
As for performance, as long as nid and the active indicator are
indexed, there shouldn't be any performance loss. Also, archiving an
old version when making a new revision will be much simpler: just
change the active indicator rather than copying an entire node to
another table (and ensuring everything gets copied...again a potential
maintainance issue).
To be honest, I would just like to see the serialized data go away,
regarless of what approach is taken.
------------------------------------------------------------------------
July 30, 2004 - 20:49 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_07-30-2004.p… (10.47 KB)
I'm interested in using Drupal for a large scale wiki-type project. In
order to do this, I need revisions to be in their own table.
Attached is a patch to do just that. Most of the changes are pretty
self explanatory. Spreading out node data across two tables meant that
I had to add database functions to do locking/transactions. Without
this, race conditions in which the database becomes corrupted are
possible.
------------------------------------------------------------------------
July 30, 2004 - 20:54 : Nick Nassar
Oh yeah... The patch is a diff against Drupal CVS
------------------------------------------------------------------------
July 31, 2004 - 01:00 : Anonymous
Gerhard speaking.
Nick, thanks a lot for your nice patch! It saves me a great deal of
labour. I looked through it and immediately liked it. You not only put
the old revisions into a new table but also the current one. Do you
have an estimate how much more expensive the additional join is?
Besides a few minor coding style issues I found a major one: Just a few
hours before you uploaded your patch JonBob's node access patch hit
core. That means your patch won't apply anymore as all the queries you
change have been changed. Can I bug you to update your patch?
------------------------------------------------------------------------
July 31, 2004 - 02:11 : Anonymous
Also I think that your upgrade path loses existing revisions.
------------------------------------------------------------------------
July 31, 2004 - 03:39 : drumm
I think this is the proper way to do things. No columns are duplicated,
there is no serialized data, and only the fields that are logically
revised are stored. Nothing jumped out at me as a way to have my node
modules be able to keep a table of revisions of additional fields. I'm
guessing this could be done within the confines of _insert and _update.
Assuming the upgrade path works and modules can extend it I give it a
+1.
------------------------------------------------------------------------
July 31, 2004 - 15:40 : Nick Nassar
It figures that just as I finish a big patch, another patch comes along
and breaks it. Oh well, it should be a pretty easy to fix. I'll work on
it.
Fixing the upgrade path to keep revisions should be fairly painless.
I found another issue that needs to be fixed before this patch gets
merged. There format of a node needs to be stored for each revision.
Otherwise, for modules that store a format for the nodes, such as page
and book, if you write one revision in PHP and the next in HTML, the
PHP revision will be displayed as HTML. This is part of a larger issue
of how node modules should store revisions of additional fields. I
think each module that wants to do this should create another table
with (nid, revid) as the primary key. Just as when they want to add
fields to a node they create another table with nid as the primary key.
As far as performance goes, for sites that make heavy use of revisions,
an extra join on primary keys is going to be a lot faster than grabbing
all of the revisions from that database everytime. We would need to run
benchmarks to determine is the overall difference in speed is for an
average site is a gain or a loss. I'm guessing it's very minor either
way.
------------------------------------------------------------------------
August 23, 2004 - 14:55 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_08-23-2004.p… (10.92 KB)
Here's an updated patch against CVS that puts revisions in their own
table, provides an upgrade path, and fixes the format related bugs in
the last patch.
Hopefully, this can make it into CVS as soon as the freeze is over.
------------------------------------------------------------------------
August 23, 2004 - 15:10 : moshe weitzman
Interesting patch ... drumm's question is still outstanding. how do
modules store revisions of their fields? Are they expected to manage
this on their own? Thats not how it works today.
As an aside, i am seeing profile_ fields in my node.revisions column.
One could argue that those need not be saved. They pertain to the node
author, not to the node itself.
------------------------------------------------------------------------
August 23, 2004 - 17:14 : Nick Nassar
Having modules be responsible for storing revisions of their own fields
is a side-effect of storing revision data in tables. There's really no
way around it. However, revisions generally don't make sense for node
types that don't have PHP/HTML content, such as polls. I think it's
going to be a pretty rare scenario for a new node type to want another
field to change per-revision, so it's a pretty good trade-off.
Storing fields that shouldn't be part of revisions, such as the
profile_ fields, is a side-effect of storing revisions as serialized
objects. Applying this patch will free up that wasted space. :)
------------------------------------------------------------------------
August 23, 2004 - 18:20 : Anonymous
There should be a hook that let's the module choose whether it supports
history. This way a module author can prevent the user from doing
something that may break his module or just cause undefined behavior.
If the module doesn't support history then don't let the user/admin
choose to add history to nodes of that type.
Craig
------------------------------------------------------------------------
August 23, 2004 - 20:23 : Nick Nassar
I agree, there should be an API change to make specifying support for
revisions easier. In the interests of keeping patches small and keeping
to one change per patch, I think the API change should be a separate
issue.
A sort of ad-hoc API to decide whether or not a module supports
revisions by default already exists. Instead of having a hook, modules
set the default value of the "Create new revision" field in the edit
form. The admin can change this option in
admin/node/configure/defaults. This patch doesn't change that.
Revisions are broken for node types that have their own database
structure, like polls, even when storing them as serialized objects.
This patch doesn't change that, either.
------------------------------------------------------------------------
October 26, 2004 - 03:35 : moshe weitzman
I'm guessing that someone is going to have to demonstrate that this
patch performs as well as current drupal before it gets comitted. i
think this patch is a few benchmarks from being comitted.
------------------------------------------------------------------------
October 27, 2004 - 02:04 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004.p… (11 KB)
I ran some really unscientific benchmarks, and it looks like this patch
has a negligible affect on performance.
I used apache bench and the database from theregular.org, which doesn't
contain any revisions (worst case scenario for this patch) and contains
several hundred nodes. Both the patched and unpatched versions hovered
between 2.36 and 2.38 requests per second.
The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2'
http://192.168.0.100/
An updated patch that should apply to CVS is attached.
------------------------------------------------------------------------
October 27, 2004 - 02:05 : Nick Nassar
I ran some really unscientific benchmarks, and it looks like this patch
has a negligible affect on performance.
I used apache bench and the database from theregular.org, which doesn't
contain any revisions (worst case scenario for this patch) and contains
several hundred nodes. Both the patched and unpatched versions hovered
between 2.36 and 2.38 requests per second.
The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2'
http://192.168.0.100/
An updated patch that should apply to CVS is attached.
------------------------------------------------------------------------
October 27, 2004 - 02:05 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004.p… (11 KB)
I ran some really unscientific benchmarks, and it looks like this patch
has a negligible affect on performance.
I used apache bench and the database from theregular.org, which doesn't
contain any revisions (worst case scenario for this patch) and contains
several hundred nodes. Both the patched and unpatched versions hovered
between 2.36 and 2.38 requests per second.
The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2'
http://192.168.0.100/
An updated patch that should apply to CVS is attached.
------------------------------------------------------------------------
November 15, 2004 - 06:05 : elias1884
please overthink the revision system default workflow as well. don't
look at the revision system as an isolated system but as a part of the
whole workflow system!
if you combine revisions with the moderation queue the most logic
default workflow would be like that:
auth user creates node (revision #0)
admin approves the node (status = 1, moderation = 0)
=> node publicly available
auth user finds typo and changes node (revision #1, status = 0,
moderation = 1)
-------
what happens at that point at the moment is, that the node is not
accessible anymore at all until the new revision is approved by admin.
of course the new revision should not go online until reviewed and
approved, this is absolutely correct, but there is no reason to not
take the old revision offline, since it was already approved and should
therefore be online until the new revision is approved. it is not
practical if a node disappears only because the author corrected a
typo.
-------
admin approves the node (status = 1, moderation = 0)
eventhough I first thought a plain boolean active field would not be
capable of providing that functionality if finally came to the
conclusion, that it can. The only thing to do is to not set that bit,
when a new revision is created, but when it is approved (in case
moderation is activated under default workflow). Every revision should
have its own moderation, status and active field and on approval they
are set like this (status=1, moderation=0, active=Y).
When you wanna rollback to an old revision, you can chose between all
revisions that already have the moderation bit set back to 0 again and
the published to 1. There should be an extra permission for rollback!
another concern that I have about the default workflow is, that users
can't see the content, they have just created, when moderation is
enabled. Eventhough, there is a big fat "submission accepted" presented
after submissions, unexperienced users tend to question the information
those stupid tincans give them, if they can't find their content
afterwards. Many users are really lazy bastards and they don't even
read the status messages. The best feedback about whether his story was
submitted successfully or not of course is, if he can find the story
somewhere on the site, maybe with a status message on top of it,
mentioning, that the content is currently not publicly available since
it has not been approved yet. there should be a my content section
under my account, like somebody is trying to do with the workspace
module I guess.
so my suggestion is to make (status=0, moderation=1) still available
for the creator under a my content section somewhere!
------------------------------------------------------------------------
November 24, 2004 - 05:21 : Nick Nassar
I agree. The current workflow for moderation queues and revisions needs
to change, but this patch isn't the place for it. The patch is already
too big, and it only does the backend stuff.
Instead of adding more to this patch and making it take even longer to
get into core, would you mind creating a new issue for your UI
suggestions, so the those changes can be added as a separate patch?
Thanks,
Nick
------------------------------------------------------------------------
December 11, 2004 - 13:26 : Dries
This patch is _much_ needed so I'd love to see someone revive it. In
order for this patch to be accepted, the following needs to be done:
Update this patch to CVS HEAD.
Rename revid to vid.
Rename node_rev to node_revisions.
Rename node_rev.changed to node_revisions.timestamp.
Rename $rnode to $revision.
Fix the coding style to match Drupal's: proper spacing, single quotes
where possible, proper variable names.
Benchmark this patch with a large database with enough revisions. I'd
be happy to benchmark this on my local copy of the drupal.org database.
The book.log field should probably move to the node_revisions table.
This can be done in a separate patch.
Investigate whether transactions are well-supported.
------------------------------------------------------------------------
December 13, 2004 - 01:25 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004-r… (11.02 KB)
I've worked a bit on the patch (coding style issues as mentioned by
Dries). One thing I noticed is that the patch uses REPLACE. IIRC this
needs to be chagned to "UPDATE, if fail INSERT" for pgsql
compatibility.
Nick, are you still interested in working on that patch? I'd like to
know how it works on your site and work on getting it into core.
------------------------------------------------------------------------
December 13, 2004 - 13:33 : Dries
Gerhard: your patch does not apply.
------------------------------------------------------------------------
December 13, 2004 - 18:10 : killes(a)www.drop.org
Yes, I know, that was the same version as I mailed to you earlier.
------------------------------------------------------------------------
December 13, 2004 - 22:02 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions.patch (52.96 KB)
Ok, upüdated the patch to cvs.
------------------------------------------------------------------------
December 14, 2004 - 09:58 : Dries
Some more comments:
db_begin_transaction() and db_end_transaction() do not belong in
database.inc, but in database.mysql.inc and database.pgsql.inc
respectively.
The node module calls node_revisionsision_list() which is not defined.
(Fxed that on my local copy.)
Do db_begin_transaction() and db_end_transaction() deprecate Jeremy's
table locking patch?
The upgrade path assigns the wrong user ID to each revision.
The upgrade path assigns the wrong date to each revision (that or a
node's revision page shows the wrong usernames/dates).
The coding style needs a bit of work, but we can worry about that
later.
------------------------------------------------------------------------
December 14, 2004 - 18:34 : Nick Nassar
If you need any help getting those things fixed, just let me know.
------------------------------------------------------------------------
December 14, 2004 - 18:50 : Nick Nassar
How this relates to Jeremy's node locking patch:
There was lots of discussion, and node locking was decided against
because from an end user point of view you never want a node to be
locked. He's now advocating for a much simpler patch that warns users
if their changes will overwrite someone elses. That patch still has a
race condition, which might be fixed using db_begin_transaction().
http://drupal.org/node/6025
------------------------------------------------------------------------
December 14, 2004 - 23:26 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_0.patch (55.96 KB)
Here is an updated patch that tries to address Dries concerns.
------------------------------------------------------------------------
December 15, 2004 - 09:32 : Dries
Attachment: http://drupal.org/files/issues/revisions-bug.png (76.06 KB)
It didn't fix the aforementioned bugs. See attached screenshot.
------------------------------------------------------------------------
January 6, 2005 - 21:15 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_1.patch (51.77 KB)
Ok, here is a new version. Dries and myself worked hart at it, so please
have a look.
what is still missing
- database upgrades for the core modules with an own table
- contrib modules need an upgrade too.
- do we need nid and vid in both the node and the node_revisions table?
- the amount of sql queries means a good stress testing for large
databases.
------------------------------------------------------------------------
January 19, 2005 - 22:43 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_2.patch (49.49 KB)
Here is an updated patch. We discussed to keep the current title in node
module and also in the revisiosn table. This is content duplication but
will save many joins as many queries only need the title of a node.
Discussion is welcome.
------------------------------------------------------------------------
January 20, 2005 - 00:33 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_3.patch (29.93 KB)
I've implemented the aforementioned solution. This makes the patch much
smaller. The patch now also removes taxonomy_node_has_term() which
wasn't used anywhere. I'd really apprciate if some people could test
drive the patch. It will be another huge improvement for 4.6.
------------------------------------------------------------------------
January 20, 2005 - 01:05 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_4.patch (30 KB)
Another revision. Steven didn't like my literal $node->vid in queries.
------------------------------------------------------------------------
January 20, 2005 - 02:10 : killes(a)www.drop.org
- database upgrades for the core modules with an own table
- contrib modules need an upgrade too.
- do we need nid and vid in both the node and the node_revisions table?
- the amount of sql queries means a good stress testing for large
databases.
These issues are still open, btw. Especially the first one needs to be
tackled.
------------------------------------------------------------------------
January 25, 2005 - 21:11 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_5.patch (51.13 KB)
Here is a patch that has the database tables updated for forum, book,
and page module.
------------------------------------------------------------------------
January 29, 2005 - 23:55 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_6.patch (49.18 KB)
Yet another update to keep it working with head. The patch now also
removes the table definitons for the page table.
------------------------------------------------------------------------
January 29, 2005 - 23:57 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_7.patch (55.69 KB)
Sorry, that was the old version, this is the right one.
------------------------------------------------------------------------
January 31, 2005 - 20:55 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_8.patch (55.71 KB)
Updated once more.
------------------------------------------------------------------------
January 31, 2005 - 21:52 : Dries
Anyone to help review/test this?
------------------------------------------------------------------------
January 31, 2005 - 22:22 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_9.patch (49.29 KB)
Updated again, the update functions occurred twice. Thanks Bart.
------------------------------------------------------------------------
February 2, 2005 - 01:27 : killes(a)www.drop.org
Don't know if the db I am using is corrupted or what. I still do have
some didficulties.
The latest patch is attached.
------------------------------------------------------------------------
February 2, 2005 - 01:27 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_10.patch (49.67 KB)
I am probably slowly going mad ...
--
View: http://drupal.org/node/7582
Edit: http://drupal.org/project/comments/add/7582
1
0
Project: Drupal
Version: cvs
Component: node system
Category: tasks
Priority: normal
Assigned to: killes(a)www.drop.org
Reported by: killes(a)www.drop.org
Updated by: killes(a)www.drop.org
Status: patch
Attachment: http://drupal.org/files/issues/revisions_10.patch (49.67 KB)
I am probably slowly going mad ...
killes(a)www.drop.org
Previous comments:
------------------------------------------------------------------------
May 5, 2004 - 17:25 : killes(a)www.drop.org
Currently all node revisions are stored in a serialized field in
node.table and retrieved for _each_ page view although they are rarely
needed. However, we have agreed that serializing data is bad and that
we should try to keep the memory foot print pf Drupal small.
Therefore I propose to create a separate revisions table which would be
in principle identical to the node table, only that it could have
several old copies of the same node. Extra data added by other modules
could be added in a serialized field unless we find a better solution.
------------------------------------------------------------------------
May 5, 2004 - 18:06 : jhriggs
I too think the serialized approach is less than desirable, but here's
an alternative. This would likely take some considerable rework in
core and contrib, but the following is how we handle similar types of
situations in our databases at work. It is more elegant that a
separate table, and avoids the (almost exact) duplication of a table.
Instead of separate tables, keep all revisions of nodes in the node
table as follows:
* add field: active (0/1 or Y/N)
* add field: revision
* every revision of a node is stored in the node table; however, only
one revision can be active at any given time
* nid can no longer be unique -- primary/unique key becomes (nid,
active)
* any time a node is loaded, updated (without revision), etc., the
active version is used.
Thoughts?
------------------------------------------------------------------------
May 5, 2004 - 18:57 : killes(a)www.drop.org
I am not opposed to your scheme, but I want to stress the following:
* Duplicating a table's structure is not bad (IMHO) as long as the
content is different.
* having two tables will allow us to have a rather small node table.
This is (maybe) a performance gain.
------------------------------------------------------------------------
May 5, 2004 - 19:37 : jhriggs
I don't necessarily think that duplicating a table's structure is _bad_.
It just seems to be wasteful and a pain to maintain. (Every change to
the node table is made twice...easy to do, but also easy to miss
perhaps.)
As for performance, as long as nid and the active indicator are
indexed, there shouldn't be any performance loss. Also, archiving an
old version when making a new revision will be much simpler: just
change the active indicator rather than copying an entire node to
another table (and ensuring everything gets copied...again a potential
maintainance issue).
To be honest, I would just like to see the serialized data go away,
regarless of what approach is taken.
------------------------------------------------------------------------
July 30, 2004 - 20:49 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_07-30-2004.p… (10.47 KB)
I'm interested in using Drupal for a large scale wiki-type project. In
order to do this, I need revisions to be in their own table.
Attached is a patch to do just that. Most of the changes are pretty
self explanatory. Spreading out node data across two tables meant that
I had to add database functions to do locking/transactions. Without
this, race conditions in which the database becomes corrupted are
possible.
------------------------------------------------------------------------
July 30, 2004 - 20:54 : Nick Nassar
Oh yeah... The patch is a diff against Drupal CVS
------------------------------------------------------------------------
July 31, 2004 - 01:00 : Anonymous
Gerhard speaking.
Nick, thanks a lot for your nice patch! It saves me a great deal of
labour. I looked through it and immediately liked it. You not only put
the old revisions into a new table but also the current one. Do you
have an estimate how much more expensive the additional join is?
Besides a few minor coding style issues I found a major one: Just a few
hours before you uploaded your patch JonBob's node access patch hit
core. That means your patch won't apply anymore as all the queries you
change have been changed. Can I bug you to update your patch?
------------------------------------------------------------------------
July 31, 2004 - 02:11 : Anonymous
Also I think that your upgrade path loses existing revisions.
------------------------------------------------------------------------
July 31, 2004 - 03:39 : drumm
I think this is the proper way to do things. No columns are duplicated,
there is no serialized data, and only the fields that are logically
revised are stored. Nothing jumped out at me as a way to have my node
modules be able to keep a table of revisions of additional fields. I'm
guessing this could be done within the confines of _insert and _update.
Assuming the upgrade path works and modules can extend it I give it a
+1.
------------------------------------------------------------------------
July 31, 2004 - 15:40 : Nick Nassar
It figures that just as I finish a big patch, another patch comes along
and breaks it. Oh well, it should be a pretty easy to fix. I'll work on
it.
Fixing the upgrade path to keep revisions should be fairly painless.
I found another issue that needs to be fixed before this patch gets
merged. There format of a node needs to be stored for each revision.
Otherwise, for modules that store a format for the nodes, such as page
and book, if you write one revision in PHP and the next in HTML, the
PHP revision will be displayed as HTML. This is part of a larger issue
of how node modules should store revisions of additional fields. I
think each module that wants to do this should create another table
with (nid, revid) as the primary key. Just as when they want to add
fields to a node they create another table with nid as the primary key.
As far as performance goes, for sites that make heavy use of revisions,
an extra join on primary keys is going to be a lot faster than grabbing
all of the revisions from that database everytime. We would need to run
benchmarks to determine is the overall difference in speed is for an
average site is a gain or a loss. I'm guessing it's very minor either
way.
------------------------------------------------------------------------
August 23, 2004 - 14:55 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_08-23-2004.p… (10.92 KB)
Here's an updated patch against CVS that puts revisions in their own
table, provides an upgrade path, and fixes the format related bugs in
the last patch.
Hopefully, this can make it into CVS as soon as the freeze is over.
------------------------------------------------------------------------
August 23, 2004 - 15:10 : moshe weitzman
Interesting patch ... drumm's question is still outstanding. how do
modules store revisions of their fields? Are they expected to manage
this on their own? Thats not how it works today.
As an aside, i am seeing profile_ fields in my node.revisions column.
One could argue that those need not be saved. They pertain to the node
author, not to the node itself.
------------------------------------------------------------------------
August 23, 2004 - 17:14 : Nick Nassar
Having modules be responsible for storing revisions of their own fields
is a side-effect of storing revision data in tables. There's really no
way around it. However, revisions generally don't make sense for node
types that don't have PHP/HTML content, such as polls. I think it's
going to be a pretty rare scenario for a new node type to want another
field to change per-revision, so it's a pretty good trade-off.
Storing fields that shouldn't be part of revisions, such as the
profile_ fields, is a side-effect of storing revisions as serialized
objects. Applying this patch will free up that wasted space. :)
------------------------------------------------------------------------
August 23, 2004 - 18:20 : Anonymous
There should be a hook that let's the module choose whether it supports
history. This way a module author can prevent the user from doing
something that may break his module or just cause undefined behavior.
If the module doesn't support history then don't let the user/admin
choose to add history to nodes of that type.
Craig
------------------------------------------------------------------------
August 23, 2004 - 20:23 : Nick Nassar
I agree, there should be an API change to make specifying support for
revisions easier. In the interests of keeping patches small and keeping
to one change per patch, I think the API change should be a separate
issue.
A sort of ad-hoc API to decide whether or not a module supports
revisions by default already exists. Instead of having a hook, modules
set the default value of the "Create new revision" field in the edit
form. The admin can change this option in
admin/node/configure/defaults. This patch doesn't change that.
Revisions are broken for node types that have their own database
structure, like polls, even when storing them as serialized objects.
This patch doesn't change that, either.
------------------------------------------------------------------------
October 26, 2004 - 03:35 : moshe weitzman
I'm guessing that someone is going to have to demonstrate that this
patch performs as well as current drupal before it gets comitted. i
think this patch is a few benchmarks from being comitted.
------------------------------------------------------------------------
October 27, 2004 - 02:04 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004.p… (11 KB)
I ran some really unscientific benchmarks, and it looks like this patch
has a negligible affect on performance.
I used apache bench and the database from theregular.org, which doesn't
contain any revisions (worst case scenario for this patch) and contains
several hundred nodes. Both the patched and unpatched versions hovered
between 2.36 and 2.38 requests per second.
The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2'
http://192.168.0.100/
An updated patch that should apply to CVS is attached.
------------------------------------------------------------------------
October 27, 2004 - 02:05 : Nick Nassar
I ran some really unscientific benchmarks, and it looks like this patch
has a negligible affect on performance.
I used apache bench and the database from theregular.org, which doesn't
contain any revisions (worst case scenario for this patch) and contains
several hundred nodes. Both the patched and unpatched versions hovered
between 2.36 and 2.38 requests per second.
The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2'
http://192.168.0.100/
An updated patch that should apply to CVS is attached.
------------------------------------------------------------------------
October 27, 2004 - 02:05 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004.p… (11 KB)
I ran some really unscientific benchmarks, and it looks like this patch
has a negligible affect on performance.
I used apache bench and the database from theregular.org, which doesn't
contain any revisions (worst case scenario for this patch) and contains
several hundred nodes. Both the patched and unpatched versions hovered
between 2.36 and 2.38 requests per second.
The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2'
http://192.168.0.100/
An updated patch that should apply to CVS is attached.
------------------------------------------------------------------------
November 15, 2004 - 06:05 : elias1884
please overthink the revision system default workflow as well. don't
look at the revision system as an isolated system but as a part of the
whole workflow system!
if you combine revisions with the moderation queue the most logic
default workflow would be like that:
auth user creates node (revision #0)
admin approves the node (status = 1, moderation = 0)
=> node publicly available
auth user finds typo and changes node (revision #1, status = 0,
moderation = 1)
-------
what happens at that point at the moment is, that the node is not
accessible anymore at all until the new revision is approved by admin.
of course the new revision should not go online until reviewed and
approved, this is absolutely correct, but there is no reason to not
take the old revision offline, since it was already approved and should
therefore be online until the new revision is approved. it is not
practical if a node disappears only because the author corrected a
typo.
-------
admin approves the node (status = 1, moderation = 0)
eventhough I first thought a plain boolean active field would not be
capable of providing that functionality if finally came to the
conclusion, that it can. The only thing to do is to not set that bit,
when a new revision is created, but when it is approved (in case
moderation is activated under default workflow). Every revision should
have its own moderation, status and active field and on approval they
are set like this (status=1, moderation=0, active=Y).
When you wanna rollback to an old revision, you can chose between all
revisions that already have the moderation bit set back to 0 again and
the published to 1. There should be an extra permission for rollback!
another concern that I have about the default workflow is, that users
can't see the content, they have just created, when moderation is
enabled. Eventhough, there is a big fat "submission accepted" presented
after submissions, unexperienced users tend to question the information
those stupid tincans give them, if they can't find their content
afterwards. Many users are really lazy bastards and they don't even
read the status messages. The best feedback about whether his story was
submitted successfully or not of course is, if he can find the story
somewhere on the site, maybe with a status message on top of it,
mentioning, that the content is currently not publicly available since
it has not been approved yet. there should be a my content section
under my account, like somebody is trying to do with the workspace
module I guess.
so my suggestion is to make (status=0, moderation=1) still available
for the creator under a my content section somewhere!
------------------------------------------------------------------------
November 24, 2004 - 05:21 : Nick Nassar
I agree. The current workflow for moderation queues and revisions needs
to change, but this patch isn't the place for it. The patch is already
too big, and it only does the backend stuff.
Instead of adding more to this patch and making it take even longer to
get into core, would you mind creating a new issue for your UI
suggestions, so the those changes can be added as a separate patch?
Thanks,
Nick
------------------------------------------------------------------------
December 11, 2004 - 13:26 : Dries
This patch is _much_ needed so I'd love to see someone revive it. In
order for this patch to be accepted, the following needs to be done:
Update this patch to CVS HEAD.
Rename revid to vid.
Rename node_rev to node_revisions.
Rename node_rev.changed to node_revisions.timestamp.
Rename $rnode to $revision.
Fix the coding style to match Drupal's: proper spacing, single quotes
where possible, proper variable names.
Benchmark this patch with a large database with enough revisions. I'd
be happy to benchmark this on my local copy of the drupal.org database.
The book.log field should probably move to the node_revisions table.
This can be done in a separate patch.
Investigate whether transactions are well-supported.
------------------------------------------------------------------------
December 13, 2004 - 01:25 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004-r… (11.02 KB)
I've worked a bit on the patch (coding style issues as mentioned by
Dries). One thing I noticed is that the patch uses REPLACE. IIRC this
needs to be chagned to "UPDATE, if fail INSERT" for pgsql
compatibility.
Nick, are you still interested in working on that patch? I'd like to
know how it works on your site and work on getting it into core.
------------------------------------------------------------------------
December 13, 2004 - 13:33 : Dries
Gerhard: your patch does not apply.
------------------------------------------------------------------------
December 13, 2004 - 18:10 : killes(a)www.drop.org
Yes, I know, that was the same version as I mailed to you earlier.
------------------------------------------------------------------------
December 13, 2004 - 22:02 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions.patch (52.96 KB)
Ok, upüdated the patch to cvs.
------------------------------------------------------------------------
December 14, 2004 - 09:58 : Dries
Some more comments:
db_begin_transaction() and db_end_transaction() do not belong in
database.inc, but in database.mysql.inc and database.pgsql.inc
respectively.
The node module calls node_revisionsision_list() which is not defined.
(Fxed that on my local copy.)
Do db_begin_transaction() and db_end_transaction() deprecate Jeremy's
table locking patch?
The upgrade path assigns the wrong user ID to each revision.
The upgrade path assigns the wrong date to each revision (that or a
node's revision page shows the wrong usernames/dates).
The coding style needs a bit of work, but we can worry about that
later.
------------------------------------------------------------------------
December 14, 2004 - 18:34 : Nick Nassar
If you need any help getting those things fixed, just let me know.
------------------------------------------------------------------------
December 14, 2004 - 18:50 : Nick Nassar
How this relates to Jeremy's node locking patch:
There was lots of discussion, and node locking was decided against
because from an end user point of view you never want a node to be
locked. He's now advocating for a much simpler patch that warns users
if their changes will overwrite someone elses. That patch still has a
race condition, which might be fixed using db_begin_transaction().
http://drupal.org/node/6025
------------------------------------------------------------------------
December 14, 2004 - 23:26 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_0.patch (55.96 KB)
Here is an updated patch that tries to address Dries concerns.
------------------------------------------------------------------------
December 15, 2004 - 09:32 : Dries
Attachment: http://drupal.org/files/issues/revisions-bug.png (76.06 KB)
It didn't fix the aforementioned bugs. See attached screenshot.
------------------------------------------------------------------------
January 6, 2005 - 21:15 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_1.patch (51.77 KB)
Ok, here is a new version. Dries and myself worked hart at it, so please
have a look.
what is still missing
- database upgrades for the core modules with an own table
- contrib modules need an upgrade too.
- do we need nid and vid in both the node and the node_revisions table?
- the amount of sql queries means a good stress testing for large
databases.
------------------------------------------------------------------------
January 19, 2005 - 22:43 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_2.patch (49.49 KB)
Here is an updated patch. We discussed to keep the current title in node
module and also in the revisiosn table. This is content duplication but
will save many joins as many queries only need the title of a node.
Discussion is welcome.
------------------------------------------------------------------------
January 20, 2005 - 00:33 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_3.patch (29.93 KB)
I've implemented the aforementioned solution. This makes the patch much
smaller. The patch now also removes taxonomy_node_has_term() which
wasn't used anywhere. I'd really apprciate if some people could test
drive the patch. It will be another huge improvement for 4.6.
------------------------------------------------------------------------
January 20, 2005 - 01:05 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_4.patch (30 KB)
Another revision. Steven didn't like my literal $node->vid in queries.
------------------------------------------------------------------------
January 20, 2005 - 02:10 : killes(a)www.drop.org
- database upgrades for the core modules with an own table
- contrib modules need an upgrade too.
- do we need nid and vid in both the node and the node_revisions table?
- the amount of sql queries means a good stress testing for large
databases.
These issues are still open, btw. Especially the first one needs to be
tackled.
------------------------------------------------------------------------
January 25, 2005 - 21:11 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_5.patch (51.13 KB)
Here is a patch that has the database tables updated for forum, book,
and page module.
------------------------------------------------------------------------
January 29, 2005 - 23:55 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_6.patch (49.18 KB)
Yet another update to keep it working with head. The patch now also
removes the table definitons for the page table.
------------------------------------------------------------------------
January 29, 2005 - 23:57 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_7.patch (55.69 KB)
Sorry, that was the old version, this is the right one.
------------------------------------------------------------------------
January 31, 2005 - 20:55 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_8.patch (55.71 KB)
Updated once more.
------------------------------------------------------------------------
January 31, 2005 - 21:52 : Dries
Anyone to help review/test this?
------------------------------------------------------------------------
January 31, 2005 - 22:22 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_9.patch (49.29 KB)
Updated again, the update functions occurred twice. Thanks Bart.
------------------------------------------------------------------------
February 2, 2005 - 01:27 : killes(a)www.drop.org
Don't know if the db I am using is corrupted or what. I still do have
some didficulties.
The latest patch is attached.
--
View: http://drupal.org/node/7582
Edit: http://drupal.org/project/comments/add/7582
1
0
Project: Drupal
Version: cvs
Component: node system
Category: tasks
Priority: normal
Assigned to: killes(a)www.drop.org
Reported by: killes(a)www.drop.org
Updated by: killes(a)www.drop.org
Status: patch
Don't know if the db I am using is corrupted or what. I still do have
some didficulties.
The latest patch is attached.
killes(a)www.drop.org
Previous comments:
------------------------------------------------------------------------
May 5, 2004 - 17:25 : killes(a)www.drop.org
Currently all node revisions are stored in a serialized field in
node.table and retrieved for _each_ page view although they are rarely
needed. However, we have agreed that serializing data is bad and that
we should try to keep the memory foot print pf Drupal small.
Therefore I propose to create a separate revisions table which would be
in principle identical to the node table, only that it could have
several old copies of the same node. Extra data added by other modules
could be added in a serialized field unless we find a better solution.
------------------------------------------------------------------------
May 5, 2004 - 18:06 : jhriggs
I too think the serialized approach is less than desirable, but here's
an alternative. This would likely take some considerable rework in
core and contrib, but the following is how we handle similar types of
situations in our databases at work. It is more elegant that a
separate table, and avoids the (almost exact) duplication of a table.
Instead of separate tables, keep all revisions of nodes in the node
table as follows:
* add field: active (0/1 or Y/N)
* add field: revision
* every revision of a node is stored in the node table; however, only
one revision can be active at any given time
* nid can no longer be unique -- primary/unique key becomes (nid,
active)
* any time a node is loaded, updated (without revision), etc., the
active version is used.
Thoughts?
------------------------------------------------------------------------
May 5, 2004 - 18:57 : killes(a)www.drop.org
I am not opposed to your scheme, but I want to stress the following:
* Duplicating a table's structure is not bad (IMHO) as long as the
content is different.
* having two tables will allow us to have a rather small node table.
This is (maybe) a performance gain.
------------------------------------------------------------------------
May 5, 2004 - 19:37 : jhriggs
I don't necessarily think that duplicating a table's structure is _bad_.
It just seems to be wasteful and a pain to maintain. (Every change to
the node table is made twice...easy to do, but also easy to miss
perhaps.)
As for performance, as long as nid and the active indicator are
indexed, there shouldn't be any performance loss. Also, archiving an
old version when making a new revision will be much simpler: just
change the active indicator rather than copying an entire node to
another table (and ensuring everything gets copied...again a potential
maintainance issue).
To be honest, I would just like to see the serialized data go away,
regarless of what approach is taken.
------------------------------------------------------------------------
July 30, 2004 - 20:49 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_07-30-2004.p… (10.47 KB)
I'm interested in using Drupal for a large scale wiki-type project. In
order to do this, I need revisions to be in their own table.
Attached is a patch to do just that. Most of the changes are pretty
self explanatory. Spreading out node data across two tables meant that
I had to add database functions to do locking/transactions. Without
this, race conditions in which the database becomes corrupted are
possible.
------------------------------------------------------------------------
July 30, 2004 - 20:54 : Nick Nassar
Oh yeah... The patch is a diff against Drupal CVS
------------------------------------------------------------------------
July 31, 2004 - 01:00 : Anonymous
Gerhard speaking.
Nick, thanks a lot for your nice patch! It saves me a great deal of
labour. I looked through it and immediately liked it. You not only put
the old revisions into a new table but also the current one. Do you
have an estimate how much more expensive the additional join is?
Besides a few minor coding style issues I found a major one: Just a few
hours before you uploaded your patch JonBob's node access patch hit
core. That means your patch won't apply anymore as all the queries you
change have been changed. Can I bug you to update your patch?
------------------------------------------------------------------------
July 31, 2004 - 02:11 : Anonymous
Also I think that your upgrade path loses existing revisions.
------------------------------------------------------------------------
July 31, 2004 - 03:39 : drumm
I think this is the proper way to do things. No columns are duplicated,
there is no serialized data, and only the fields that are logically
revised are stored. Nothing jumped out at me as a way to have my node
modules be able to keep a table of revisions of additional fields. I'm
guessing this could be done within the confines of _insert and _update.
Assuming the upgrade path works and modules can extend it I give it a
+1.
------------------------------------------------------------------------
July 31, 2004 - 15:40 : Nick Nassar
It figures that just as I finish a big patch, another patch comes along
and breaks it. Oh well, it should be a pretty easy to fix. I'll work on
it.
Fixing the upgrade path to keep revisions should be fairly painless.
I found another issue that needs to be fixed before this patch gets
merged. There format of a node needs to be stored for each revision.
Otherwise, for modules that store a format for the nodes, such as page
and book, if you write one revision in PHP and the next in HTML, the
PHP revision will be displayed as HTML. This is part of a larger issue
of how node modules should store revisions of additional fields. I
think each module that wants to do this should create another table
with (nid, revid) as the primary key. Just as when they want to add
fields to a node they create another table with nid as the primary key.
As far as performance goes, for sites that make heavy use of revisions,
an extra join on primary keys is going to be a lot faster than grabbing
all of the revisions from that database everytime. We would need to run
benchmarks to determine is the overall difference in speed is for an
average site is a gain or a loss. I'm guessing it's very minor either
way.
------------------------------------------------------------------------
August 23, 2004 - 14:55 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_08-23-2004.p… (10.92 KB)
Here's an updated patch against CVS that puts revisions in their own
table, provides an upgrade path, and fixes the format related bugs in
the last patch.
Hopefully, this can make it into CVS as soon as the freeze is over.
------------------------------------------------------------------------
August 23, 2004 - 15:10 : moshe weitzman
Interesting patch ... drumm's question is still outstanding. how do
modules store revisions of their fields? Are they expected to manage
this on their own? Thats not how it works today.
As an aside, i am seeing profile_ fields in my node.revisions column.
One could argue that those need not be saved. They pertain to the node
author, not to the node itself.
------------------------------------------------------------------------
August 23, 2004 - 17:14 : Nick Nassar
Having modules be responsible for storing revisions of their own fields
is a side-effect of storing revision data in tables. There's really no
way around it. However, revisions generally don't make sense for node
types that don't have PHP/HTML content, such as polls. I think it's
going to be a pretty rare scenario for a new node type to want another
field to change per-revision, so it's a pretty good trade-off.
Storing fields that shouldn't be part of revisions, such as the
profile_ fields, is a side-effect of storing revisions as serialized
objects. Applying this patch will free up that wasted space. :)
------------------------------------------------------------------------
August 23, 2004 - 18:20 : Anonymous
There should be a hook that let's the module choose whether it supports
history. This way a module author can prevent the user from doing
something that may break his module or just cause undefined behavior.
If the module doesn't support history then don't let the user/admin
choose to add history to nodes of that type.
Craig
------------------------------------------------------------------------
August 23, 2004 - 20:23 : Nick Nassar
I agree, there should be an API change to make specifying support for
revisions easier. In the interests of keeping patches small and keeping
to one change per patch, I think the API change should be a separate
issue.
A sort of ad-hoc API to decide whether or not a module supports
revisions by default already exists. Instead of having a hook, modules
set the default value of the "Create new revision" field in the edit
form. The admin can change this option in
admin/node/configure/defaults. This patch doesn't change that.
Revisions are broken for node types that have their own database
structure, like polls, even when storing them as serialized objects.
This patch doesn't change that, either.
------------------------------------------------------------------------
October 26, 2004 - 03:35 : moshe weitzman
I'm guessing that someone is going to have to demonstrate that this
patch performs as well as current drupal before it gets comitted. i
think this patch is a few benchmarks from being comitted.
------------------------------------------------------------------------
October 27, 2004 - 02:04 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004.p… (11 KB)
I ran some really unscientific benchmarks, and it looks like this patch
has a negligible affect on performance.
I used apache bench and the database from theregular.org, which doesn't
contain any revisions (worst case scenario for this patch) and contains
several hundred nodes. Both the patched and unpatched versions hovered
between 2.36 and 2.38 requests per second.
The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2'
http://192.168.0.100/
An updated patch that should apply to CVS is attached.
------------------------------------------------------------------------
October 27, 2004 - 02:05 : Nick Nassar
I ran some really unscientific benchmarks, and it looks like this patch
has a negligible affect on performance.
I used apache bench and the database from theregular.org, which doesn't
contain any revisions (worst case scenario for this patch) and contains
several hundred nodes. Both the patched and unpatched versions hovered
between 2.36 and 2.38 requests per second.
The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2'
http://192.168.0.100/
An updated patch that should apply to CVS is attached.
------------------------------------------------------------------------
October 27, 2004 - 02:05 : Nick Nassar
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004.p… (11 KB)
I ran some really unscientific benchmarks, and it looks like this patch
has a negligible affect on performance.
I used apache bench and the database from theregular.org, which doesn't
contain any revisions (worst case scenario for this patch) and contains
several hundred nodes. Both the patched and unpatched versions hovered
between 2.36 and 2.38 requests per second.
The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2'
http://192.168.0.100/
An updated patch that should apply to CVS is attached.
------------------------------------------------------------------------
November 15, 2004 - 06:05 : elias1884
please overthink the revision system default workflow as well. don't
look at the revision system as an isolated system but as a part of the
whole workflow system!
if you combine revisions with the moderation queue the most logic
default workflow would be like that:
auth user creates node (revision #0)
admin approves the node (status = 1, moderation = 0)
=> node publicly available
auth user finds typo and changes node (revision #1, status = 0,
moderation = 1)
-------
what happens at that point at the moment is, that the node is not
accessible anymore at all until the new revision is approved by admin.
of course the new revision should not go online until reviewed and
approved, this is absolutely correct, but there is no reason to not
take the old revision offline, since it was already approved and should
therefore be online until the new revision is approved. it is not
practical if a node disappears only because the author corrected a
typo.
-------
admin approves the node (status = 1, moderation = 0)
eventhough I first thought a plain boolean active field would not be
capable of providing that functionality if finally came to the
conclusion, that it can. The only thing to do is to not set that bit,
when a new revision is created, but when it is approved (in case
moderation is activated under default workflow). Every revision should
have its own moderation, status and active field and on approval they
are set like this (status=1, moderation=0, active=Y).
When you wanna rollback to an old revision, you can chose between all
revisions that already have the moderation bit set back to 0 again and
the published to 1. There should be an extra permission for rollback!
another concern that I have about the default workflow is, that users
can't see the content, they have just created, when moderation is
enabled. Eventhough, there is a big fat "submission accepted" presented
after submissions, unexperienced users tend to question the information
those stupid tincans give them, if they can't find their content
afterwards. Many users are really lazy bastards and they don't even
read the status messages. The best feedback about whether his story was
submitted successfully or not of course is, if he can find the story
somewhere on the site, maybe with a status message on top of it,
mentioning, that the content is currently not publicly available since
it has not been approved yet. there should be a my content section
under my account, like somebody is trying to do with the workspace
module I guess.
so my suggestion is to make (status=0, moderation=1) still available
for the creator under a my content section somewhere!
------------------------------------------------------------------------
November 24, 2004 - 05:21 : Nick Nassar
I agree. The current workflow for moderation queues and revisions needs
to change, but this patch isn't the place for it. The patch is already
too big, and it only does the backend stuff.
Instead of adding more to this patch and making it take even longer to
get into core, would you mind creating a new issue for your UI
suggestions, so the those changes can be added as a separate patch?
Thanks,
Nick
------------------------------------------------------------------------
December 11, 2004 - 13:26 : Dries
This patch is _much_ needed so I'd love to see someone revive it. In
order for this patch to be accepted, the following needs to be done:
Update this patch to CVS HEAD.
Rename revid to vid.
Rename node_rev to node_revisions.
Rename node_rev.changed to node_revisions.timestamp.
Rename $rnode to $revision.
Fix the coding style to match Drupal's: proper spacing, single quotes
where possible, proper variable names.
Benchmark this patch with a large database with enough revisions. I'd
be happy to benchmark this on my local copy of the drupal.org database.
The book.log field should probably move to the node_revisions table.
This can be done in a separate patch.
Investigate whether transactions are well-supported.
------------------------------------------------------------------------
December 13, 2004 - 01:25 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004-r… (11.02 KB)
I've worked a bit on the patch (coding style issues as mentioned by
Dries). One thing I noticed is that the patch uses REPLACE. IIRC this
needs to be chagned to "UPDATE, if fail INSERT" for pgsql
compatibility.
Nick, are you still interested in working on that patch? I'd like to
know how it works on your site and work on getting it into core.
------------------------------------------------------------------------
December 13, 2004 - 13:33 : Dries
Gerhard: your patch does not apply.
------------------------------------------------------------------------
December 13, 2004 - 18:10 : killes(a)www.drop.org
Yes, I know, that was the same version as I mailed to you earlier.
------------------------------------------------------------------------
December 13, 2004 - 22:02 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions.patch (52.96 KB)
Ok, upüdated the patch to cvs.
------------------------------------------------------------------------
December 14, 2004 - 09:58 : Dries
Some more comments:
db_begin_transaction() and db_end_transaction() do not belong in
database.inc, but in database.mysql.inc and database.pgsql.inc
respectively.
The node module calls node_revisionsision_list() which is not defined.
(Fxed that on my local copy.)
Do db_begin_transaction() and db_end_transaction() deprecate Jeremy's
table locking patch?
The upgrade path assigns the wrong user ID to each revision.
The upgrade path assigns the wrong date to each revision (that or a
node's revision page shows the wrong usernames/dates).
The coding style needs a bit of work, but we can worry about that
later.
------------------------------------------------------------------------
December 14, 2004 - 18:34 : Nick Nassar
If you need any help getting those things fixed, just let me know.
------------------------------------------------------------------------
December 14, 2004 - 18:50 : Nick Nassar
How this relates to Jeremy's node locking patch:
There was lots of discussion, and node locking was decided against
because from an end user point of view you never want a node to be
locked. He's now advocating for a much simpler patch that warns users
if their changes will overwrite someone elses. That patch still has a
race condition, which might be fixed using db_begin_transaction().
http://drupal.org/node/6025
------------------------------------------------------------------------
December 14, 2004 - 23:26 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_0.patch (55.96 KB)
Here is an updated patch that tries to address Dries concerns.
------------------------------------------------------------------------
December 15, 2004 - 09:32 : Dries
Attachment: http://drupal.org/files/issues/revisions-bug.png (76.06 KB)
It didn't fix the aforementioned bugs. See attached screenshot.
------------------------------------------------------------------------
January 6, 2005 - 21:15 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_1.patch (51.77 KB)
Ok, here is a new version. Dries and myself worked hart at it, so please
have a look.
what is still missing
- database upgrades for the core modules with an own table
- contrib modules need an upgrade too.
- do we need nid and vid in both the node and the node_revisions table?
- the amount of sql queries means a good stress testing for large
databases.
------------------------------------------------------------------------
January 19, 2005 - 22:43 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_2.patch (49.49 KB)
Here is an updated patch. We discussed to keep the current title in node
module and also in the revisiosn table. This is content duplication but
will save many joins as many queries only need the title of a node.
Discussion is welcome.
------------------------------------------------------------------------
January 20, 2005 - 00:33 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_3.patch (29.93 KB)
I've implemented the aforementioned solution. This makes the patch much
smaller. The patch now also removes taxonomy_node_has_term() which
wasn't used anywhere. I'd really apprciate if some people could test
drive the patch. It will be another huge improvement for 4.6.
------------------------------------------------------------------------
January 20, 2005 - 01:05 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_4.patch (30 KB)
Another revision. Steven didn't like my literal $node->vid in queries.
------------------------------------------------------------------------
January 20, 2005 - 02:10 : killes(a)www.drop.org
- database upgrades for the core modules with an own table
- contrib modules need an upgrade too.
- do we need nid and vid in both the node and the node_revisions table?
- the amount of sql queries means a good stress testing for large
databases.
These issues are still open, btw. Especially the first one needs to be
tackled.
------------------------------------------------------------------------
January 25, 2005 - 21:11 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_5.patch (51.13 KB)
Here is a patch that has the database tables updated for forum, book,
and page module.
------------------------------------------------------------------------
January 29, 2005 - 23:55 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_6.patch (49.18 KB)
Yet another update to keep it working with head. The patch now also
removes the table definitons for the page table.
------------------------------------------------------------------------
January 29, 2005 - 23:57 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_7.patch (55.69 KB)
Sorry, that was the old version, this is the right one.
------------------------------------------------------------------------
January 31, 2005 - 20:55 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_8.patch (55.71 KB)
Updated once more.
------------------------------------------------------------------------
January 31, 2005 - 21:52 : Dries
Anyone to help review/test this?
------------------------------------------------------------------------
January 31, 2005 - 22:22 : killes(a)www.drop.org
Attachment: http://drupal.org/files/issues/revisions_9.patch (49.29 KB)
Updated again, the update functions occurred twice. Thanks Bart.
--
View: http://drupal.org/node/7582
Edit: http://drupal.org/project/comments/add/7582
1
0
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: Goba
Status: patch
Well, speaking of skip navigation links, phptemplate and xtemplate
should expose the REQUEST_URI to the templates, so when a link to an
anchor on the same page is needed, the link can be formatted with the
complete request URI in mind.
Goba
Previous comments:
------------------------------------------------------------------------
November 19, 2004 - 04:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 20, 2004 - 05:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 20, 2004 - 05:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 10:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 11:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 21:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 17:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 17:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 18:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 22, 2004 - 01:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
<?php
$base = $parts['path'] . '/' ;
?>
Replace that by:
<?php
$base = ( $path == '/' ? $base : $parts['path'] . '/' );
?>
------------------------------------------------------------------------
November 28, 2004 - 05:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 10:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 19:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 20:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 29, 2004 - 00:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
<base href="http://example.com" />
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 21:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 22:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 23:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 16:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
------------------------------------------------------------------------
January 17, 2005 - 14:00 : chx
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
------------------------------------------------------------------------
January 17, 2005 - 14:09 : Goba
I don't think that removing <base> from the themes is a good idea, using
$parts['path'] should be encouraged though before the files, which would
fix the google cache problem, and would still keep the HTML size low. It
would also help those, who save the file to find the originating site
easier, since clicking on a non-pagelocal link would lead to the online
version.
------------------------------------------------------------------------
January 17, 2005 - 17:03 : Steven
Definitely -1 on removing the <base> tag or using absolute or
root-relative URLs. This tag has been around for ages, and it is the
only way to make clean URLs work without bloating in the code. FYI,
"base" is (first?) mentioned in Berners-Lee's HTML 1.0 draft [1].
That's June 1993.
As the amount of clean URL-using sites grows, the crawlers will have to
be updated. Perhaps we could prevent crawlers from going too insane by
404ing for URLs with more than say 10 components? That would prevent
the really crappy ones from hammering your site.
I'm all for making the <base> tag easier to handle for the user (say,
by including a filter to allow simple anchor links to work as most
people expect them to), but we should keep Drupal-generated URLs clean
and completely relative.
[1] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
------------------------------------------------------------------------
January 17, 2005 - 17:57 : kbahey
The problem with css is this: The @import argument does not start with a
/.
This is simple to fix.
We keep the "base" as it is today, but add the new variable: $base
before it.
So for a site where Drupal is installed in the DocumentRoot, all that
will change is that /misc/drupal.css and /themes/themename/style.css
will be preceded by a slash. For sites that use another path, that path
will be prepended to the css file name.
How about that?
------------------------------------------------------------------------
January 17, 2005 - 18:38 : Steven
What exactly is the problem with the @import? As far as I know:
- url() in stylesheets is interpreted relative to the base of the
stylesheet, not the source document.
- However, if the styles are inside an HTML document, through a style
tag or style attribute, then the stylesheet's location is the same as
the HTML document.
- Thus, the stylesheet's base is the same as the base of the HTML
document (which can be altered through the <base> tag).
I just don't see why it is necessary. As far as I know, the only
browser that has had problems resolving CSS urls properly was Netscape
4, which does not support @import at all, and which Drupal does not
support either, because of its CSS usage.
------------------------------------------------------------------------
January 17, 2005 - 19:35 : kbahey
The problem for stylesheets is as follows. I think it mainly affect
crawlers and Google's cache.
Say you have an installtion of Drupal in DocumentRoot. You then use url
aliases, and put slashes in them.
For example, you use news/general/2004-12-15.html for a node.
That node still has misc/drupal.css and themes/pushbutton/style.css in
the head section if the document. Crawlers get fooled by that and try
to look for /news/general/misc/drupal.css and
/news/general/themes/pushbutton/style.css, which don't exist.
So, just prepending the new $base variable (in chx's patch) before the
stylesheet @import argument would fix this issue. Assuming you are in
DocumentRoot, then /misc and /themes would be used instead of just misc
and themes.
It would still be compliant with standards, be relative to the web
site, and no ambiguous to anyone, be they crawler or browser.
I hope it is clearer now.
I think chx can change the patch to use the $base instead of $base_url
everywhere, so as to avoid the host/domain name in the urls.
------------------------------------------------------------------------
January 17, 2005 - 22:36 : Steven
But typical crawlers don't even pay attention to stylesheets, hence it
wouldn't have much use for them. I just don't see why we should adjust
to rare cases of buggy software. Reading out a base URL from an HTML
document is dead easy, and on top of that it doesn't add more
complexity as without the base tag, the document's URL is already an
implicit base which has to be parsed anyway.
I did not like it when we altered the <link> tag to accomodate buggy
RSS readers and I certainly don't like it now, as this is even rarer.
In both cases, it is not Drupal which is at fault.
------------------------------------------------------------------------
January 17, 2005 - 22:48 : kbahey
Steven
While I agree with most of what you said, the 404s show up in the logs
enough to be a bother.
Perhaps the original design of Drupal did not forsee that people will
use url aliases to mimic directory/file hierarchies. Whether this was
intended or not, it is the way many use Drupal today.
It does not matter where the bug is (Drupal or the external world), as
long as we can stop it ourselves, by adjusting our end of it.
The fix is simple enough and does not break standards (if implemented
as described with a leading / before the css).
------------------------------------------------------------------------
January 17, 2005 - 22:54 : Steven
It does not break standards, but it does bloat the code in an ugly way.
Why not send an e-mail to the owners of the crawlers and tell them to
implement a standard that is nearly 10 years old [2] (RFC 1808)?
Note that Google Cache now seems to correctly interpret base URLs [3]
and even adds a <base> tag of its own.
By the way, this problem has nothing to do with people using URL
aliases or not, as for a browser the regular nested paths that Drupal
uses (e.g. "node/1" is no different from aliases mimicking files
"foo/bar.html").
[2] http://www.faqs.org/rfcs/rfc1808.html
[3]
http://www.google.be/search?q=cache%3Awww.drupal.org&sourceid=mozilla-searc…
------------------------------------------------------------------------
January 17, 2005 - 23:34 : Goba
Steven, part of the problem is that Google cache does add a base href
even if there is a base href in the document. Eg adds a <BASE
HREF="http://drupal.org/node/13733"> on the plone comparision page
cached. Now that since HTML does not allow more than one base tag [4]
to be present, it is up to the browsers, to use the first or the last
base value, or any of the base values on the page for that matter as
the used base. So even pages displayed from the google cache will be
buggy if a full relative path to the domain root is not specified, due
to this problem.
[4]
http://www.w3.org/TR/1999/REC-html401-19991224/sgml/dtd.html#head.content
------------------------------------------------------------------------
January 18, 2005 - 08:14 : chx
Attachment: http://drupal.org/files/issues/base_url_kill_0.patch (4.75 KB)
This one does not use the whole base_url only the path part of it. HTML
bloat is kept at minimal.
------------------------------------------------------------------------
February 2, 2005 - 00:33 : clairem
Please please can this be done?
It's a good idea in itself, but if using fully-qualified paths means we
can get rid of the BASE HREF, then page anchors will work without having
the overhead of a filter. That's be a huge bonus for those creating
larger nodes, or who just want to be able to put a "skip navigation"
link in their theme without having to abandon Xtemplate or PHPtemplate
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
Project: Drupal
Version: cvs
Component: base system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: kbahey
Updated by: clairem
Status: patch
Please please can this be done?
It's a good idea in itself, but if using fully-qualified paths means we
can get rid of the BASE HREF, then page anchors will work without having
the overhead of a filter. That's be a huge bonus for those creating
larger nodes, or who just want to be able to put a "skip navigation"
link in their theme without having to abandon Xtemplate or PHPtemplate
clairem
Previous comments:
------------------------------------------------------------------------
November 19, 2004 - 03:29 : kbahey
Looking at my site's logs, there seem to be several problems that are
caused by Drupal's use of relative path names.
If Drupal causes all the site's urls to be absolute, then none of this
would be an issue.
A. Search Engine Crawlers
Getting lots of 404s on things like: linux/index.html/robots.txt
Where 'linux' is an alias to a taxonomy, and 'index.html' is an alias
to a node within that taxonomy.
Another example, is recursing unnecessarily. I see 404s on things like:
/linux/index.html/linux/index.html
Where 'linux' is a path alias for a taxonomy term, and 'index.html' is
an alias to the main node within it.
This does not seem to happen when Google crawls my sites, but Yahoo's
Slurp suffers from this problem, and keeps recursing. MSNBot also
suffers from this.
Another crawler/harvester called Blinkx/DFS-Fetch keeps adding the .css
file to the relative path, getting a 404 on things like:
/linux/themes/xtemplate/pushbutton/logo.gif
And Fast Search Engine also attempts to access:
/linux/contact/tracker/tracker/user/password
The same goes for grub.org, another crawler.
B. Google Cache / Archive Way Back Machine
Pages in Google cache and archive.org Way Back Machine suffer form a
similar problem: the .css files cannot be found, and hence rendering of
the pages is not correct.
Examples:
Compare this: http://www.drupal.org/node/4647
To this:
http://www.google.ca/search?q=cache:www.drupal.org/node/view/4647
Notice the following:
How there is no formatting at all, because of the lack of a .css file
The httpd log on Drupal will show errors for:
linux/themes/pushbutton/style.css and linux/misc/drupal.css
Also see:
http://web.archive.org/web/20031016184902/http://www.drupal.org/
C. Proxy Caches:
When someone is browsing my site from behind a proxy cache, the web
site is hit with a rapid succession of requests, and many of it is just
for bogus pages.
Examples:
2004/11/17 - 17:47 404 error: linux/user/1 not found.
2004/11/17 - 17:47 404 error: linux/feedback not found.
2004/11/17 - 17:47 404 error: linux/tracker not found.
2004/11/17 - 17:47 404 error: linux/sitemap not found.
2004/11/17 - 17:47 404 error: linux/search not found.
2004/11/17 - 17:47 404 error: linux/misc not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/programming not found.
2004/11/17 - 17:47 404 error: linux/linux not found.
2004/11/17 - 17:47 404 error: linux/technology not found.
2004/11/17 - 17:47 404 error: linux/writings not found.
2004/11/17 - 17:47 404 error: linux/family not found.
And also:
2004/11/17 - 07:23 404 error: history/user/1 not found.
2004/11/17 - 07:23 404 error: history/tracker not found.
2004/11/17 - 07:23 404 error: history/feedback not found.
2004/11/17 - 07:23 404 error: history/sitemap not found.
2004/11/17 - 07:23 404 error: history/search not found.
2004/11/17 - 07:23 404 error: history/misc not found.
2004/11/17 - 07:23 404 error: history/technology not found.
2004/11/17 - 07:23 404 error: history/science not found.
2004/11/17 - 07:22 404 error: history/history not found.
2004/11/17 - 07:22 404 error: history/writings not found.
2004/11/17 - 07:22 404 error: history/family not found.
As you can tell, history and linux are aliases to taxonomy terms, and
so is misc, technology, writings, family, ...etc. The user agent is
appending the taxonomy term alias to the url and forming a new URL.
D. Regular Browsing:
There is even at least one extreme case where the following URL was
accessed (the result was 404 of course)
/book/view/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/themes/xtemplate/pushbutton/logo.gif
It seems it was a normal user, because the user agent is: "Mozilla/4.0
(compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
Proposed Solution:
As a proposed solution, all URLs in Drupal can be made into absolute
path names. This can be done by the following:
The variable $base_url in the conf.php file is broken down into two
components:
$base_host (the 'http://whatever-host.example.com' part WITHOUT the
trailing slash)
$base_path (the '/path-to-drupal' part, WITH the leading slash. If this
is the DocumentRoot, then it is just a '/' character)
$base_url is now $base_host concatenated with $base_path
A simple filter can be written to preceed every href="path" with the
$base_path variable, so it becomes "/path"
This option can be turned on and off for a site. The default is to have
it off so current behavior is maintained.
A similar scheme applies for style sheets as well.
So, did I miss something obvious? Am I seriously off the mark?
Your thoughts!
------------------------------------------------------------------------
November 20, 2004 - 04:10 : chrisada
I am getting similar 404 errors, mainly from rss feed link that looks
like /blog/blog/feed and many manual links that are relative to drupal
root.
It was not a problem before Drupal 4.5, so I think there might not be a
need to change all URIs to absolute. I can't see where the problem is
coming from though.
------------------------------------------------------------------------
November 20, 2004 - 04:35 : kbahey
I am pretty sure that these problems were happening for at least the
past 10 months (ever since I moved to Drupal in January 2004).
The main issue here is that crawlers and other user agents get confused
by the relative path names.
Using absolute paths will definitely solve this. However, is this the
only solution?
I am looking for a discussion of this.
------------------------------------------------------------------------
November 20, 2004 - 09:55 : Goba
No absolute paths please. Having the path start with '/' solves all the
mentioned problems, and is not absolute, it is relative to the domain.
Sadly some crawlers and even the Google Cache does not obey to the base
href. I have reported this cache problem in April to Google, and they
promised they will keep it in mind... Hehe...
What we need is to have the printed relative path values relative to
the domain name, and not relative to the Drupal installation path.
Note that this issue will appear on the drupal devel mailing list if
someone finally provides a patch we can talk about :)
------------------------------------------------------------------------
November 20, 2004 - 10:19 : Dries
Goba is right. We need paths relative to the domain name to fix this
'problem'.
------------------------------------------------------------------------
November 20, 2004 - 20:19 : kbahey
Sorry for not making my self clear.
When I said absolute, I meant that they start with just a /. I did NOT
mean that they start with http://host.example.com. That would be a very
bad idea.
In any case, what do people think about the proposed solution (breaking
down $base_url into two parts?)
Also, does this address the style sheets as well, or more is needed?
------------------------------------------------------------------------
November 21, 2004 - 16:24 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch.txt (471 bytes)
I have implemented what Goba suggested.
------------------------------------------------------------------------
November 21, 2004 - 16:43 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_0.txt (825 bytes)
Maybe this one is faster?
------------------------------------------------------------------------
November 21, 2004 - 17:34 : kbahey
Man! You are fast!
I tried the second version. It works fine for things that are not
inside the node body, I mean they have a / in front of them, as we
want it to be.
Two comments/issues:
- If there is a URL that is already "/" representing the home page, it
gets set to "//". Perhaps it should check for that case?
- URLs in nodes that do not start with / do not get changed to have a /
prepended to them. Do we need a filter for this?
- Do we need to do something for the style sheets in the page header? I
mean the "misc/drupal.css" and "themes/themename/style.css"?
Thanks
------------------------------------------------------------------------
November 22, 2004 - 00:28 : kbahey
Hi chx
Here is a fix for the case where you have a url that is just "/".
In your patch, instead of:
<?php
$base = $parts['path'] . '/' ;
?>
Replace that by:
<?php
$base = ( $path == '/' ? $base : $parts['path'] . '/' );
?>
------------------------------------------------------------------------
November 28, 2004 - 04:08 : kbahey
Did this patch make it into CVS yet?
If there are any objections to it, can someone please explain what they
are?
Thanks
------------------------------------------------------------------------
November 28, 2004 - 09:33 : Dries
Shouldn't your changes be included in the patch?
Also, it's better to cache $base rather than $parts.
Lastly, it this patch makes it to HEAD, we should probably remove some
'base url' cruft from the themes.
------------------------------------------------------------------------
November 28, 2004 - 18:54 : kbahey
Attachment: http://drupal.org/files/issues/x.diff (1 KB)
Here is the patch including my fix.
I am asking chx to comment on caching $base instead of $parts.
Will this make it faster?
------------------------------------------------------------------------
November 28, 2004 - 19:26 : chx
Hm. $base = ( $path == '/' ? $base : $parts['path'] . '/' ); this
depends on path which is a parameter. Thus I fail to see how could we
cache $base. I'd correct this code however $base = ( $path == '/' ? ''
: $parts['path'] . '/' ); 'cos I think $base is not defined before, but
this is not a problem, PHP will be happy to replace NULL with NULL...
Maybe instead of all parts, only $parts['path'] is enough to be cached,
yes, but the performance and memory usage difference -- I guess -- would
not be noticable...
------------------------------------------------------------------------
November 28, 2004 - 23:06 : kbahey
Attachment: http://drupal.org/files/issues/common-inc-patch.txt (1 KB)
OK.
I put in chx suggested change.
This patch can go in CVS then, to rid us of the problems with paths not
beginning with slash.
This is not an ultimate solution still. We need to address the problem
with .css files. Although the header contains a:
<base href="http://example.com" />
it does not seem that major search engines and archiving sites obey it
anyway.
------------------------------------------------------------------------
December 2, 2004 - 20:31 : Dries
Your coding style needs work. Also, I'm not going to commit this unless
the themes get fixed up: we'd end up with invalid URLs all over the
place. Lastly, I wonder how portable the themes will be when Drupal is
run from within a subdirectory.
------------------------------------------------------------------------
December 2, 2004 - 21:15 : chx
Attachment: http://drupal.org/files/issues/common_inc_patch_1.txt (849 bytes)
Well, my patch worked from a subdirectory very well, as fact, I have not
tested it from the root dir. And I think that it adheres to coding
standards. So I resubmit it with the root path fix. However, my Drupal
work is focused on i18n these days, and I was never into themeing so it
won't be me who fixes those.
------------------------------------------------------------------------
December 2, 2004 - 22:07 : kbahey
I have tested the previous patch (including my fix) with drupal
installed in the DocumentRoot of the server.
So, in effect, it is tested with both Drupal in / and Drupal in a
subdirectory.
This change fixes the problem for the crawlers and other browsers from
getting confused.
While it is true that there is no fix for the .css files in the HTML
head section yet, this fix deals with a major part of the problem, and
rids us of a major pain. Check your web server's logs some time to see
what I mean.
Someone who is familiar with the themes can contribute a patch later.
This patch and the future fix for themes are not mutually exclusive, so
let it go in CVS.
------------------------------------------------------------------------
December 9, 2004 - 15:31 : Goba
Please commit this into Drupal core, this fix is badly needed.
------------------------------------------------------------------------
January 17, 2005 - 13:00 : chx
Attachment: http://drupal.org/files/issues/base_url_kill.patch (4.34 KB)
Well as noone have stepped in to fix this problem, I have tried to fix
the themes also. themes.inc , xtemplate.engine and the bluemarine
template is patched besides common.inc.
Of course, more templates could follow, but first I'd like to see your
opinions.
------------------------------------------------------------------------
January 17, 2005 - 13:09 : Goba
I don't think that removing <base> from the themes is a good idea, using
$parts['path'] should be encouraged though before the files, which would
fix the google cache problem, and would still keep the HTML size low. It
would also help those, who save the file to find the originating site
easier, since clicking on a non-pagelocal link would lead to the online
version.
------------------------------------------------------------------------
January 17, 2005 - 16:03 : Steven
Definitely -1 on removing the <base> tag or using absolute or
root-relative URLs. This tag has been around for ages, and it is the
only way to make clean URLs work without bloating in the code. FYI,
"base" is (first?) mentioned in Berners-Lee's HTML 1.0 draft [1].
That's June 1993.
As the amount of clean URL-using sites grows, the crawlers will have to
be updated. Perhaps we could prevent crawlers from going too insane by
404ing for URLs with more than say 10 components? That would prevent
the really crappy ones from hammering your site.
I'm all for making the <base> tag easier to handle for the user (say,
by including a filter to allow simple anchor links to work as most
people expect them to), but we should keep Drupal-generated URLs clean
and completely relative.
[1] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
------------------------------------------------------------------------
January 17, 2005 - 16:57 : kbahey
The problem with css is this: The @import argument does not start with a
/.
This is simple to fix.
We keep the "base" as it is today, but add the new variable: $base
before it.
So for a site where Drupal is installed in the DocumentRoot, all that
will change is that /misc/drupal.css and /themes/themename/style.css
will be preceded by a slash. For sites that use another path, that path
will be prepended to the css file name.
How about that?
------------------------------------------------------------------------
January 17, 2005 - 17:38 : Steven
What exactly is the problem with the @import? As far as I know:
- url() in stylesheets is interpreted relative to the base of the
stylesheet, not the source document.
- However, if the styles are inside an HTML document, through a style
tag or style attribute, then the stylesheet's location is the same as
the HTML document.
- Thus, the stylesheet's base is the same as the base of the HTML
document (which can be altered through the <base> tag).
I just don't see why it is necessary. As far as I know, the only
browser that has had problems resolving CSS urls properly was Netscape
4, which does not support @import at all, and which Drupal does not
support either, because of its CSS usage.
------------------------------------------------------------------------
January 17, 2005 - 18:35 : kbahey
The problem for stylesheets is as follows. I think it mainly affect
crawlers and Google's cache.
Say you have an installtion of Drupal in DocumentRoot. You then use url
aliases, and put slashes in them.
For example, you use news/general/2004-12-15.html for a node.
That node still has misc/drupal.css and themes/pushbutton/style.css in
the head section if the document. Crawlers get fooled by that and try
to look for /news/general/misc/drupal.css and
/news/general/themes/pushbutton/style.css, which don't exist.
So, just prepending the new $base variable (in chx's patch) before the
stylesheet @import argument would fix this issue. Assuming you are in
DocumentRoot, then /misc and /themes would be used instead of just misc
and themes.
It would still be compliant with standards, be relative to the web
site, and no ambiguous to anyone, be they crawler or browser.
I hope it is clearer now.
I think chx can change the patch to use the $base instead of $base_url
everywhere, so as to avoid the host/domain name in the urls.
------------------------------------------------------------------------
January 17, 2005 - 21:36 : Steven
But typical crawlers don't even pay attention to stylesheets, hence it
wouldn't have much use for them. I just don't see why we should adjust
to rare cases of buggy software. Reading out a base URL from an HTML
document is dead easy, and on top of that it doesn't add more
complexity as without the base tag, the document's URL is already an
implicit base which has to be parsed anyway.
I did not like it when we altered the <link> tag to accomodate buggy
RSS readers and I certainly don't like it now, as this is even rarer.
In both cases, it is not Drupal which is at fault.
------------------------------------------------------------------------
January 17, 2005 - 21:48 : kbahey
Steven
While I agree with most of what you said, the 404s show up in the logs
enough to be a bother.
Perhaps the original design of Drupal did not forsee that people will
use url aliases to mimic directory/file hierarchies. Whether this was
intended or not, it is the way many use Drupal today.
It does not matter where the bug is (Drupal or the external world), as
long as we can stop it ourselves, by adjusting our end of it.
The fix is simple enough and does not break standards (if implemented
as described with a leading / before the css).
------------------------------------------------------------------------
January 17, 2005 - 21:54 : Steven
It does not break standards, but it does bloat the code in an ugly way.
Why not send an e-mail to the owners of the crawlers and tell them to
implement a standard that is nearly 10 years old [2] (RFC 1808)?
Note that Google Cache now seems to correctly interpret base URLs [3]
and even adds a <base> tag of its own.
By the way, this problem has nothing to do with people using URL
aliases or not, as for a browser the regular nested paths that Drupal
uses (e.g. "node/1" is no different from aliases mimicking files
"foo/bar.html").
[2] http://www.faqs.org/rfcs/rfc1808.html
[3]
http://www.google.be/search?q=cache%3Awww.drupal.org&sourceid=mozilla-searc…
------------------------------------------------------------------------
January 17, 2005 - 22:34 : Goba
Steven, part of the problem is that Google cache does add a base href
even if there is a base href in the document. Eg adds a <BASE
HREF="http://drupal.org/node/13733"> on the plone comparision page
cached. Now that since HTML does not allow more than one base tag [4]
to be present, it is up to the browsers, to use the first or the last
base value, or any of the base values on the page for that matter as
the used base. So even pages displayed from the google cache will be
buggy if a full relative path to the domain root is not specified, due
to this problem.
[4]
http://www.w3.org/TR/1999/REC-html401-19991224/sgml/dtd.html#head.content
------------------------------------------------------------------------
January 18, 2005 - 07:14 : chx
Attachment: http://drupal.org/files/issues/base_url_kill_0.patch (4.75 KB)
This one does not use the whole base_url only the path part of it. HTML
bloat is kept at minimal.
--
View: http://drupal.org/node/13148
Edit: http://drupal.org/project/comments/add/13148
1
0
[drupal-devel] [bug] File attachments are sometimes saved with an incorrect mimetype
by tangent 01 Feb '05
by tangent 01 Feb '05
01 Feb '05
Project: Drupal
Version: 4.5.2
Component: file system
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: tangent
Updated by: tangent
Status: patch
Has anyone tested this patch?
I think this is an issue that needs to be addressed. I'm just not sure
we want all those extension tests in there.
tangent
Previous comments:
------------------------------------------------------------------------
January 24, 2005 - 23:29 : tangent
Files attached to issues (like patch files) which have "inc" in the
filename are incorrectly served with a text/html mime type and their
contents are marked up with html. This is clearly inappropriate.
See the issue below for some examples of this occurance.
http://drupal.org/node/16021
------------------------------------------------------------------------
January 24, 2005 - 23:51 : tangent
A filename with "module" in the name is also served the same way.
Perhaps we should also test for "mysql, pgsql, php, css, xtmpl, sh, pl,
txt" and any other text file extensions. If this is not desirable, at
least change the filter to look for these extensions *at the end* of
the filename and allow .patch (or possibly .diff) files to be served as
plain text.
------------------------------------------------------------------------
January 30, 2005 - 12:32 : Dries
The project module saves the mime-type sent by the browser when
uploading the patch, and reuses that when serving the patch for
download. It looks like some people upload patches with the wrong
mime-type set.
------------------------------------------------------------------------
January 31, 2005 - 13:01 : tangent
Attachment: http://drupal.org/files/issues/file_mimetype.patch (3.81 KB)
This issue is a symptom caused by faulty mimetype handling in file.inc
so I'm updating the issue to the correct component. The old title was
"File attachments with '.inc' in filename are served as HTML".
Apparently, the real issue is that file.inc currently stores the
content-type presented by the user-agent when file is uploaded. This
is not the best thing to do because a user-agent can present an
incorrect content-type either inadvertantly or, when used by a
malicious person, purposefully.
There are a couple of methods to determine the actual mimetype of a
file but they all have drawbacks.
The Fileinfo (http://pecl.php.net/package/fileinfo) PHP extension is
the recommended tool for the job but is not typically available on a
typical host. The mime_content_type() function requires the "file"
program to be available which is not always the case either. Then there
is the fallback method of testing the filename extension against a list
of internally known extensions.
I've created a patch which attempts the first 2 options and then falls
back to the third. There may be a better option which I have not
thought of though so feel free to offer suggestions.
--
View: http://drupal.org/node/16142
Edit: http://drupal.org/project/comments/add/16142
1
0
drupal-cvs(a)drupal.org wrote:
> User: goba Branch: HEAD Date: Tue, 1 Feb 2005 17:29:19 +0000
>
> Added files:
> /modules/filebrowser README.txt filebrowser.module
>
> Log message:
> Add filebrowser module with some initial documentation in the README.
> We use this module to present the subversion browser on drupal.hu,
> with all the special information about translators of files, and
> file states.
I tried looking for the 'subversion browser on drupal.hu' but could not
find anything.
--
Dries Buytaert :: http://www.buytaert.net/
4
5
Project: Drupal
Version: 4.5.0
Component: profile.module
Category: feature requests
Priority: normal
Assigned to: Alison(a)deanspace.org
Reported by: Alison(a)deanspace.org
Updated by: chx
Status: patch
Features can be added only to CVS HEAD not to 4.5.0 and please submit a
patch not the whole module http://drupal.org/patch
chx
Previous comments:
------------------------------------------------------------------------
February 1, 2005 - 15:57 : Alison(a)deanspace.org
Add the ability to list the options of a profile field if the field is
given but a value isn't to the profile module.
For example, right now /profile shows all public user information,
/profile/profile_school shows a blank page and
profile/profile_school/Miami University shows all public info on users
for that school. The purpose of this feature would be to replace the
blank page with a listing of the options.
------------------------------------------------------------------------
February 1, 2005 - 15:58 : Alison(a)deanspace.org
Attachment: http://drupal.org/files/issues/profile_1.module (22.61 KB)
I have written the code that allows a list of all possible schools to
show up.
--
View: http://drupal.org/node/16554
Edit: http://drupal.org/project/comments/add/16554
1
0
Project: Drupal
Version: 4.5.0
Component: profile.module
Category: feature requests
Priority: normal
Assigned to: Alison(a)deanspace.org
Reported by: Alison(a)deanspace.org
Updated by: Alison(a)deanspace.org
-Status: active
+Status: patch
Attachment: http://drupal.org/files/issues/profile_1.module (22.61 KB)
I have written the code that allows a list of all possible schools to
show up.
Alison(a)deanspace.org
Previous comments:
------------------------------------------------------------------------
February 1, 2005 - 14:57 : Alison(a)deanspace.org
Add the ability to list the options of a profile field if the field is
given but a value isn't to the profile module.
For example, right now /profile shows all public user information,
/profile/profile_school shows a blank page and
profile/profile_school/Miami University shows all public info on users
for that school. The purpose of this feature would be to replace the
blank page with a listing of the options.
--
View: http://drupal.org/node/16554
Edit: http://drupal.org/project/comments/add/16554
1
0