[drupal-devel] [feature] Add Folksonomy, or "Free Tagging", to Taxonomy

Morbus Iff drupal-devel at drupal.org
Fri Apr 1 18:21:38 UTC 2005


Issue status update for http://drupal.org/node/19697

 Project:      Drupal
 Version:      cvs
 Component:    taxonomy.module
 Category:     feature requests
 Priority:     normal
 Assigned to:  Morbus Iff
 Reported by:  Morbus Iff
 Updated by:   Morbus Iff
 Status:       patch

I think his points were:
* hierarchy in folksonomy = bad; don't work on it.
* minimal patch to start = good; let's see what people do.
* related terms = good; (my screenshot [1]).
[1] http://disobey.com/detergent/2005/similar_keywords.jpg


Morbus Iff



Previous comments:
------------------------------------------------------------------------

March 30, 2005 - 10:16 : Morbus Iff

Attachment: http://drupal.org/files/issues/taxonomy_all.patch (19.53 KB)

This patch adds folksonomy support to Drupal (named internally as "Free
tagging"). In a nutshell, the core difference is the input method:
unlike normal taxonomies which are administratively controlled, a "free
tagging" vocabulary allows tag creation when the node is submitted. It
does this through an text input box, as opposed to a dropdown or
selectbox. This patch:

Removes the useless "Preview form" of a vocabulary.
Alters the vocabulary table to include a new "tags" column.
Adds a new "Free tagging" preference on vocabulary creation/editing.
Modifies the vocabulary overview to support pagers for free tagging
vocabs.

The new code integrates tightly with the existing taxonomy code. The
only additional processing occurs on node save and edit, where we parse
through the tags associated with a node. All other display (and thus,
code) remains the same. The following screenshots illustrate the
changes, integration, and workflow:

Create/edit vocabulary screen. [2]
Create/edit a node. [3]
Result of previous screen. [4]
The new admin/taxonomy. [5]
Clicking on \"view terms\". [6]

These patches were made during the exploration and customization of
Drupal by http://www.NHPR.org. In loving support of open source
software, http://www.NHPR.org will continue to contribute patches they
feel the community will benefit from. Questions about this patch should
be directed to morbus at disobey.com.
[2] http://disobey.com/detergent/2005/drupal_folkdef.jpg
[3] http://disobey.com/detergent/2005/drupal_folknodeedit.jpg
[4] http://disobey.com/detergent/2005/drupal_folknodesubmit.jpg
[5] http://disobey.com/detergent/2005/drupal_folkpager1.jpg
[6] http://disobey.com/detergent/2005/drupal_folkpager2.jpg


------------------------------------------------------------------------

March 30, 2005 - 10:38 : Morbus Iff

Attachment: http://drupal.org/files/issues/taxonomy_all_0.patch (19.52 KB)

Updated patch to fix some errors in the update.inc change.


------------------------------------------------------------------------

March 31, 2005 - 10:27 : Morbus Iff

Attachment: http://drupal.org/files/issues/taxonomy_all_1.patch (19.54 KB)

New patch for check_plain and HEAD. Also removed the term indent under
vocabularies - there was an extra-space issue in regards to
_taxonomy_depth, and I felt it was better to just remove the
(non-standard, non-semantic) indent I originally added during the move
to tablular display.


------------------------------------------------------------------------

March 31, 2005 - 13:19 : Anonymous

A big +1 from me! This patch is going to be *extremely* useful for
modules like "image" that involve frequent creation of new taxonomy
terms.

I've been testing this patch extensively for a couple of days. I love
the fact that it is totally non-intrusive onto existing sites if the
admin doesn't want to use it when creating vocabularies, and that the
free-tagged terms become "ordinary terms" in the database structure,
with no special-case table.

Morbus Iff has done a great job of adding a powerful feature without
breaking anything, as far as I can tell.

I can see lots of ways in which this can evolve in the future, such as
more fine-grained security so that some users can add a given node type
without being able to add new free-tagged terms (i.e., that class of
users would have to pick from existing terms only, even if the
vocabulary allows higher-privileged users to add free terms). I can
also see a place for user-owned free-tag vocabularies that are
dedicated to their personal image albums. But these features could be
added in a future release and still be backward-compatible with what
Morbus has done now. That being the case, I suggest that this patch be
accepted into core.


------------------------------------------------------------------------

March 31, 2005 - 13:20 : syscrusher

Comment #3 was from me (syscrusher). Sorry I forgot to login first.


------------------------------------------------------------------------

March 31, 2005 - 15:12 : Dries

I'll commit this patch to core as soon CVS HEAD is opened up for
development.  For now, I'm awaiting feedback from the usability folks. 
I'm also left wondering how this would affect taxonomy-based permissions
-- I don't think that should be a problem but it is somewhat
mind-boggling.
I haven't tested the patch yet, but I glanced at the code quickly:
1. Don't use the word 'node' in user output.  Use 'post'.
2. The words 'term' and 'tags' are both used in user output.  This
might be confusing, but I don't see an easy way around it. The way it
is used makes sense, so it might be a non-issue. 
3. Some extra documentation might be in order.  The explanation of
'free tagging' is quite technical.  For example, I don't understand the
following bit: "Allows the creation of a vocabulary during content
creation, as well as through the normal administrative means.".  I'd
rather see it explain the difference/advantage/drawbacks to help me
decide whether to enable 'free tagging' or not.  Make the documentation
more task-minded.
4. I don't like the way you manipulate the pager's global variables. 
The paging code is a bit of a hack, it seems.
5. Spacing: we write 'foreach (' not 'foreach('.
6. We usually write code comments above the code, not after the code on
the same line.  This is really minor as I'm sure we don't do this
consistently.  Some code comments are rather cryptic and didn't help me
much.  Maybe give your code comments some love.
That's all for now.


------------------------------------------------------------------------

March 31, 2005 - 15:19 : FactoryJoe at civicspacelabs.org

Attachment: http://drupal.org/files/issues/drupal_folkdef.png (21.7 KB)

In setting up the folksonomy, you present far too many options to the
user. I tried to cut these extraneous options out but then decided to
redo the whole Vocabulary creation workflow. :)
Go figure.


------------------------------------------------------------------------

March 31, 2005 - 15:27 : Morbus Iff

I'll address what I can. A new patch will be forthcoming.
#2: I agree - I previously wanted to keep everything as "term", and
spent a good bit of time on #drupal getting jbond (who has since
disappeared from all discussion) to agree that "term == tag" and the
only difference between the two was their method of input. Eventually,
I felt that "tag" was not only a "term" (and term), but also an action.
I'm not "terming a node", but I'm "tagging it", which is a common sorta
phrase in other folksonomy implementations. Thus, the mixing of the
two.
#4: Yeah, I know it's a hack. I had a comment in there (since removed
due to moshe's suggestion) that I knew it was a hack, and that
integrating with the existing pager code would be uber-difficult based
on hierarchies and the recursive nature of the taxonomy_get_tree.
#6: Heh, heh. Boy oh boy. Wrong thing to say. I often find myself
overdoing it on comments (for example [7]), and the patch as is was a
concentrated effort to /reduce/ the amount of comments I had originally
put in there (see also [8]). I'll take another look at 'em.
[7]
http://cvs.sourceforge.net/viewcvs.py/amphetadesk/AmphetaDesk/lib/AmphetaDesk.pm?rev=1.47&view=auto
[8] http://lists.drupal.org/archives/drupal-devel/2005-03/msg01010.html


------------------------------------------------------------------------

March 31, 2005 - 15:36 : Morbus Iff

Regarding #7, it appears factoryjoe wants a far grander rewrite of the
taxonomy UI than this patch purports to do.


------------------------------------------------------------------------

March 31, 2005 - 15:59 : Dries

If you can take on such rewrite (or parts thereof) based on Chris'
suggestions, by all means.  
Depending on the required UI changes, such overhaul might automagically
deal with the pager implementation issues (if the pagers get nuked that
is).


------------------------------------------------------------------------

March 31, 2005 - 16:19 : Morbus Iff

Not in this patch. His plans are for 4.7, and include wizards, removing
most all of the checkboxes on that page, and so on and so forth.
Likewise, he's only about 30% of the way there (per #drupal) in his UI
mockups, so not to be considered with this folksonomy patch at all.


------------------------------------------------------------------------

March 31, 2005 - 16:22 : Morbus Iff

(Er.. which isn't to say that I think this folksonomy patch is for 4.6 -
I know 4.7 will be its earliest release. But, he's just not ready with
the workflow in his head yet for me to address any of his issues. And,
based on discussion in #drupal, I'm not sure /I/ want to be the one
implementing the changes, much less agreeing with them, he has planned
[g]). No offense to him, of course - he's (admittedly) too early in his
thinking to take on all affronts.)


------------------------------------------------------------------------

March 31, 2005 - 16:23 : Bèr Kessels

Cris's ideas are great, but should IMO not be confused with this
folksonomy isseu.
Can we not focus on getting folksonomy in, be it with a
less-then-perfect-UI, and then open a new issue to improve the UI of
taxonomy?
Bèr


------------------------------------------------------------------------

March 31, 2005 - 16:44 : Uwe Hermann

+1 from me. While I haven't tested the patch, yet, it looks very good
and I'd really love to see this in 4.6. I hope it's not too late to get
it in...


------------------------------------------------------------------------

March 31, 2005 - 17:31 : grohk

I will add my +1 to the pile.  This patch is working well for me.  I
like Chris' mockup, but I also agree with Morbus that his taxonomy UI
ideas are probably beyond the scope of this patch.  I actually like
that Morbus has added this functionality without drastically altering
the taxonomy admin interface.  Bravo.


------------------------------------------------------------------------

March 31, 2005 - 17:58 : FactoryJoe at civicspacelabs.org

As far as my workflow changes, yeah, they won't be ready for 4.6. I do
think that getting this into the next release is important though, so
that we have the general functionality.
I have concerns about "multi-select" and related terms though. I mean,
those shouldn't be user options -- those should be allowed by default.
Also, I talked to Drumm at lunch about flat-lists vs tree-hierarchy and
we seem to agree that it's a needless distinction. It's better to have a
"tree-like hierarchy" (or controlled vocabulary/outline list) and "free
tagging" as the main distinctions. Because it might make sense to make
your flat-list a hierarchy later on, but not so much your tags. (even
though Morbus tends to see a use for making a hierarchy out of free
tags.)
But this later discussion probably belongs somewhere else...
So I'm pretty much okay with this moving forward with the understanding
that the categories UI needs a wizard-like overhaul for 4.7.


------------------------------------------------------------------------

March 31, 2005 - 22:34 : Jaza

+1 from me for this patch. The interface is so simple - just one text
box - and yet so powerful. Being able to add terms at node creation is
a critical feature for the next release, IMO, but I never envisioned it
being implemented so cleanly and intuitively. Great work, Morbus!
However, three problems/shortcomings that I found with the patch:
1. You cannot create new sub-terms using free tagging (yet). Say I have
an existing term called 'sporting news', and I am writing an article
about soccer and baseball. The term 'soccer' already exists as a
subterm of 'sporting news'. I have no existing term for baseball. I
would like to be able to enter into the text box:
"sporting news->soccer, sporting news->baseball"
And it would create 'baseball' as a new subterm (and assign the
existing term 'soccer'). With the current patch, the only way to do
this is to create the terms with free tagging, and then to go into the
admin interface and make them subterms.
2. I don't like the new "view terms" link. I prefer having my vocabs
and terms all listed together. In fact, when I made my first term using
free tagging, I went into the admin interface, and couldn't work out
where my term was, until I saw the "view terms" link. So this is a
usability issue - Drupal admins are accustomed to the current layout of
the categories page, and may find the change cumbersome. Perhaps make
this a setting?
3. The free tagging text box should IMO be displayed AS WELL AS, not
instead of, the regular 'select term(s)' list box. Surely users will
want to select an existing term, rather than typing it? And if they do
type it, they should be able to check (as they type) that they're
spelling it correctly. Also, users should be able to see what existing
terms there are, so that they don't create new terms that are virtually
duplicates.
I also agree (with the person that already said it) that new
permissions are needed as part of this patch.


------------------------------------------------------------------------

March 31, 2005 - 23:12 : Morbus Iff

Addressing Jaza:
#1: There was some discussion about this, and it was eventually put on
the backburner [9].
#2: The goal of the new "view terms" was to address the immense size
that a tag vocabulary can grow to. It is quite easy to have these
vocabularies become 100+, continuing to grow ever bigger. Having the
vocabulary admin screen contain 1000+ terms all at once is a problem.
This is less of an issue with a controlled vocabulary, which is why
those terms are still displayed inline (while it IS possible to create
a controlled vocabulary of 1000+ term, it is a bit more unlikely than
with a folksonomy). There was /some/ talk of "if vocabulary term count
is less than 50, always display inline", but I couldn't readily solve
the UI issue of what happened between term #49 and term #50 (49 terms
are displayed inline, you add a 50th one, go back to the overview
screen and "wtf?! all my tags are deleted! arrrRRgh!"). The nearest fix
to that was a "There are more than 50 terms in this category. View all
terms." message, similar to the \"No terms\" one [10]. If people feel
this is a proper way of handling it, then I'll roll it into this patch.
#3: For the same reason as #2: a vocab with 1000+ is very prohibitive
to have a dropdown. As for similar keywords, I plan on making a third
party module that this sort of \"similar keywords\" GUI. [11] for tag
based vocabularies, which uses the "Related terms" feature of a
taxonomy term.
[9] http://lists.drupal.org/archives/drupal-devel/2005-03/msg00733.html
[10] http://disobey.com/detergent/2005/drupal_folkpager1.jpg
[11] http://disobey.com/detergent/2005/similar_keywords.jpg


------------------------------------------------------------------------

April 1, 2005 - 10:19 : Dries

Postponing the UI changes is OK, though I'd still like to see if we can
make the pager stuff less of a hack.
If you have 1000+ terms you probably don't want to manage them as
regular terms.  You'll want to do things like searching terms (eg.
"search for term "*John*"), merging terms (eg. "merge term 'Governer
John Lynch' into term 'John Lynch'"), sort terms by popularity (eg.
"what terms are used only once?") and act upon them in batch mode. 
Eventually, that might also impact the UI.
I also wonder how the folksonomy module affects the various taxonomy_*
modules as well as the content filter on the 'admin/content' page.
This is going to be interesting. ;-)


------------------------------------------------------------------------

April 1, 2005 - 10:33 : Morbus Iff

Dries: the big problem with the pager() stuff is vocabulary hierarchies,
which is what _get_tree gives us (along with additional depth and parent
attributes). I could probably reduce the hack's size by using
pager_query() with a throwaway SQL statement, but I'd also have to
throwaway the returned database $result, since it wouldn't be useful
(no hierarchy information). On the order of hacktitude, though, this is
probably as equal a hack, only smaller (and possibly more expensive,
since it'd be another db pull). Thoughts?


------------------------------------------------------------------------

April 1, 2005 - 10:38 : moshe weitzman

personally, i think a pager rewrite is out of scope for this patch. it
is also a very minor part of the patch. during the next release cycle
someone can go in and improve pager's API for handling collections that
are not SQL query result sets.


------------------------------------------------------------------------

April 1, 2005 - 11:02 : syscrusher

Some comments on the tree hierarchy issue and also release schedules...

Comprehensive tree hierarchy support is nice if feasible, but I don't
think it's essential for the patch to begin being very useful. As a
simple, interim workaround, I would suggest using a slash or backslash
(accept either of them, for maximum user-friendliness) as a delimiter
between levels, and handle the hierarchy internally. For example, if
the tree now looks like this:
one
-one A
-one B
-one C
two
-two A
--two A one
--two A two
-two B
then I could put "one/one A/newterm ABC, two/two A/two A one/newterm
IJK,newterm XYZ" into the tags field to make it look like this
afterwards;
one
-one A
--newterm ABC
-one B
-one C
two
-two A
--two A one
---newterm IJK
--two A two
-two B
newterm XYZ
(In my examples, I didn't use any backslashes because I wasn't sure how
they would render in the issue, but the idea is that the two punctuation
marks are treated as equivalent. Also, as a Linux maven I found myself
instinctively putting a trailing backslash on things that "felt" like a
directory path, so it would be wise to trivially trim trailing
punctuation from each term when validating, because I'll bet I'm not
the only one who would do that.)

Advantages:

Novice users can just ignore this admittedly-advanced feature, and
still use Morbus' most elegantly simple UI with no changes.
Syntactically, looks like disk directories, which makes it a little
more intuitive to users than other delimiters like "->" or "::" that I
have seen similarly used in other software (and which are "intuitive"
only to programmers).
Relatively modest code addition to the existing patch.
Adds, but doesn't /change/ anything in the existing UI, nor require
database schema changes, so it can be added /after/ initial release of
the module without requiring user-retraining or upgrade scripts.


Disadvantages:

What to do if the user mistypes an ancestor path component?

Add it as-is, as if doing "mkdir -p /pathname/" in Linux /et al/?
Issue an error message and ignore that term? (Node gets created with
one free-tag term missing.)
Issue an error message and force correction? (Annoying to novices, but
probably only advanced users would use this hierarchy feature
anyway...and the message could helpfully list all of the /existing/
terms at the failing tree level. The user would still manually type
what they want when correcting the entry, but now they'd have a guide
in the error message text to help them know what they mistyped.)
Try to intelligently match to a near-miss? (Complex code! SOUNDEX might
help here.)


I still feel that the patch deserves consideration as-is, with only bug
fixes before it is available as a contrib. Morbus has done a great job
of laying a foundation to which we can add all this nifty stuff later
without ripping out his existing UI. Let's leverage that elegant
thinking and get this tool into the hands of site owners. My suggestion
would be a published contrib patch available for 4.6, then let Dries do
what he sees fit with regard to core for 4.7.

—Scott


------------------------------------------------------------------------

April 1, 2005 - 12:24 : FactoryJoe at civicspacelabs.org

At the risk of sounding obtuse, I really think that this module should
be as simple as possible, and then let other modules add functionality
to the new free tagging vocabulary type. And by that I mean that
tree-hierarchies seem to me to be beyond the scope of this patch, even
though you get it "for free".
I know that a good number of developers are going to come out against
this idea, but in all the popular folksonomic systems that I've seen
gain popular following (delicious, flickr, gmail and so on), they only
offer flat tagging and that seems to sufficient. If you've used a
tree-enabled folksonomy and can point me to a demo, please do -- I
simply have never seen such a thing attempted before in a popular
application. 
As a matter of fact, Jaza makes the perfect case for me when he
comments that "You cannot create new sub-terms using free tagging
(yet). Say I have an existing term called 'sporting news', and I am
writing an article about soccer and baseball.... I would like to be
able to enter into the text box: "sporting news-soccer, sporting
news-baseball" And it would create 'baseball' as a new subterm (and
assign the existing term 'soccer')."
This is what a tree-like hierarchy is for and the limitation here is
not going to, nor should it, be fixed with free tagging. Rather, this
is a UI issue with the current implementation of tree-like hierarchies
in Drupal. What Jaza wants is a tree -- so instead of free-tagging, he
should be able to use his existing vocabulary "sporting news" and then
*be able to add a term to the tree inline* instead of having to go all
the way through the round-trip of adding the term in the admin UI. This
is *not* free tagging; this is adding a new leaf to a tree -- free
tagging, as Jaza suggests, would /simply make it more convenient/
without actually solving the real problem.
Honestly, I think putting too much specific implementation stuff into
this patch before we've had time to see how it's going to be used in
the wild is a bad idea. I do, however, think that it's important to
focus on the synonomic, spelling and merging issues that are inherent
problems in folksonomic systems. Related terms are intrinsic in
folksonomies, so I would rather see more work put into that UI issue
than trying to recreate tree-hierarchies.
And I know that, just as I say, "we don't know how this is going to be
used in the wild" I'll get the response that "well then we shouldn't
limit its functionality until we know how it will be used" but I
fervently disagree. That's one of Drupal's out-of-the-box problems. You
get /too much/ stuff! Give people things that are manageable -- things
that I can fit in my pea-brain. And if you need more functionality --
by all means built it! And stick it in a module that I can download
later! But folksonomy is important enough -- and Morbus has done a
really great job keeping it fairly minimal so far -- that I think we
can possibly go just a little further, pull it back some, and make free
tagging work the way it should work instead of being just a more
convenient version of the other types of vocabulary.


------------------------------------------------------------------------

April 1, 2005 - 12:36 : moshe weitzman

factoryjoe - all that text and i can't figure out what you are
proposing. you proposing we don't use the vocabulary/taxonomy system
for tags? if so, what is the better option? or maybe you are just
emphasizing the importance of getting the 'realtions' stuff right. we
will get there, and taxonomy provides excellent tools for doing so (see
'related terms' and 'synonyms')
folks, lets use *action* statements when we comment on patches. if you
don't like something, make a counter proposal. patch reviews are not
places for chatter.
i apologize for my grumpiness these days. i am seeing way more chatter
on the devel list than focused collaboration on core functionality.





More information about the drupal-devel mailing list