[drupal-devel] Taxonomy Similar: Very Early First Draft
Hey all. Anyone interested in my tagging patch has probably seen: http://disobey.com/detergent/2005/similar_keywords.jpg which is/was a homebrew UI to select similar tags to the ones entered during content creation. I've just finished an early draft for Drupal: http://disobey.com/detergent/2005/drupal_taxonomy_similar.jpg cvs.drupal.org/viewcvs/drupal/contributions/modules/taxonomy_similar/ Some notes: * the low match count is because I've only imported 8000 terms, and not all the node content that exists at NHPR.org. I'll be populating term_node soon so I can get an idea of the hit on tag->node counting. * the module works best when you already have a large tagging vocabulary. It'll probably be pretty useless if you have less than, say, 500 terms. * there's no help, README, blah - hell, there's not much of anything in the way of documentation besides that screenshot. Still, the code DOES work (and the next version will redirect you to node/$nid instead of itself after submission). * it will ONLY kick in on free-tagging vocabularies. there is no configuration necessary. install it, enable it, insert/update a node that uses free tagging. I got into some small discussion about the UI today, and I wanted to rationalize why I've done it the way I've done it. The current UI supports ALL these possibilities, and I've yet to find a compact or intuitive alternative based on other folks suggestions. * don't change a thing (close the window, or "submit"). * re-assign tags (uncheck your original, check new one). * add tags (keep your original checked, check new one). * delete tags (uncheck your original, don't check new one). I have plans on the following explorations: * Explore "merging" of tags, such that two similar tags can be merged into one combined tag. this would be an 'administer taxonomy' access. * Explore playing with term_relation so that similar tags are related to one another when a user makes that explicit choice. * Explore a 'view nodes' sorta screen or block, which would just link off to similar keywords. this seems evilly dangerous, since the similarity code is prohibitive to run on a large vocab (8000+ tested). * Currently, there's a bit of confusion when it comes to "your tags" that are also "similar tags" for another one. So, say you choose "Death Penalty". A similar tag is "Juvenile Death Penalty", so you also choose that one. The NEXT time you go back to that screen, should "Juvenile Death Penalty" (now one of "your tags") still be shown as a similar tag under "Death Penalty"? From a clutter standpoint, no. From a comprehension/expectation standpoint, I'm not sure. Anyways. -- Morbus Iff ( you are nothing without your robot car, NOTHING! ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus
On Thursday 07 April 2005 15:07, Morbus Iff wrote:
* add tags (keep your original checked, check new one).
Hi, Morbus! First off, let me say "great design work" on the UI. It's blindingly obvious how it works, and I think even novices will find it intuitive. One small suggestion, though. The particular line quoted above scares me very much. I think that novice users who agree "yes, I should switch over to the suggested tag" will instinctively check that but will forget to *uncheck* the existing tag they hand-entered. May I suggest radio buttons (choose one only) instead of checkboxes for this selection process? This would have the side-effect of also preventing the user from choosing more than one of the suggested similar tags, which IMO is a *benefit*. If the point of the similar tags feature is to rationalize tags, then you really *don't* want a person choosing "Governor Joe Schmoe", "Joe Schmoe", *and* "Mr. Joe Schmoe" anyway, do you? Nice work, though! If this pans out, I think it will very much broaden the use-cases where free-tag taxonomy is useful. My commendations on all of your work in this area. The versatility of Drupal's existing controlled taxonomy is one of Drupal's very best architectural features, in my opinion. What you are doing is to seamlessly extend the range of things that taxonomy can do, without breaking anything that's already there. Kudos. Scott -- -----------------------+------------------------------------------------------ Scott Courtney | "I don't mind Microsoft making money. I mind them scott@4th.com | having a bad operating system." -- Linus Torvalds http://4th.com/ | ("The Rebel Code," NY Times, 21 February 1999) | PGP Public Key at http://4th.com/keys/scott.pubkey
May I suggest radio buttons (choose one only) instead of checkboxes for this selection process? This would have the side-effect of also preventing the user from choosing more than one of the suggested similar tags, which IMO is a *benefit*. If the point of the similar tags feature is to rationalize tags, then you really *don't* want a person choosing "Governor Joe Schmoe", "Joe Schmoe", *and* "Mr. Joe Schmoe" anyway, do you?
Someone else suggested this to me as well. I'm still leaning toward checkboxes, and the biggest problem may be my screenshot, which clearly demonstrates, really, only /equivalent/ tags. But that's not the only aspect we should be considering. Take, for example, horses. node #1: Mommy, I'm taking horseriding lessons. I luv'd it so much! node #2: Mommy, I like riding on Palomino horses the best! Whee! node #3: Midget horses are ugly. I topple them like cows! Now, the tags: node #1: horses, horseriding, love, hobby node #2: horses, palomino, wishlist node #3: midget, horses, cows You update #2. It will see "horses", and realize that it is a similar term for "horseriding" (defined in node #1). With radio buttons, this is a problem. You can either keep "horses", which is still accurate, or you can choose ONLY "horseriding", which is also accurate, but on a .. "tinier" level ("horseriding" is a mental child of "horses" - you can't ride a horse without a "horse"). "horses" and "horse riding" could, and probably should, both be applied to node #2 (such that it would show up in a specific search of "horseriding", but also in the more general search of "horses"). With radio buttons, the only way to associate this new, relevant-but-not-equivalent tag, would be to modify the tags manually on the node edit screen. Lots of clicks, and lots of forgetting. Saying that "horseriding" should REPLACE "horses", however, is inaccurate. Node #3 has nothing to do with "horseriding", even though that tag would appear on the "Similar Tags" list. Another possibility (off the top of my head and not fully worked out) is "terrorism" and "terrorist". "terrorism" is the act, "terrorist" is the person. One can not substitute for both (as you can talk about the social aspects of "terrorism" without mentioning a single "terrorist"). -- Morbus Iff ( you are nothing without your robot car, NOTHING! ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus
On Thursday 07 April 2005 16:01, Morbus Iff wrote:
Saying that "horseriding" should REPLACE "horses", however, is inaccurate. Node #3 has nothing to do with "horseriding", even though that tag would appear on the "Similar Tags" list.
Hmmmm.... Yes, I can see your point. I may have been a little myopic in my view, since on my sites our editorial policy is "always use the most-specific tag". In my situation, all tags are created by trained staff and not by end users, and even when I deploy free tagging, the tags will be reviewed by editors just as we currently review fine-tune what controlled tags the submitter chose. I agree with you, though, that not every site owner will follow that "most specific tag" philosophy as we do. What about this as an alternative, using your "horses" example? In this "UI pseudocode" notation, let [R] imply a radio one-of-N button, and [ ] imply a checkbox:
[R] The tag "horseriding" is just perfect, so keep it. [R] Use similar tag "horses" instead (3 matches). [R] Use similar tag "horse farming" instead (2 matches). [ ] Keep my "horseriding" tag *and* the similar tag I have selected.
The ">" next to the horseriding line indicates that this radio button is the default choice. Since the checkbox (bottom line) is off by default, and the original tag is the default radio button selection, the action if the user just blindly hits SUBMIT is, as you have wisely defined, to change nothing at all. Note that the last line customizes for each instance, to make its intent very clear to the user. If a novice user picks "horses" instead of "horseriding", the simple action then becomes to do the most common task of choosing that instead (as the prompt text indicates). But if they *really* want both tags, they can check the last line's checkbox to make that happen. If they check the "Keep ... and ..." line but don't select one of the similar tags, no big deal -- just ignore the checkbox if they didn't change the tag. No harm done. The intent of what I'm proposing is to keep Morbus' design principle of failing gracefully if the user submits the form without a lot of careful thinking, but to prevent such a reflex from creating excessive tag counts. Comments? Scott -- -----------------------+------------------------------------------------------ Scott Courtney | "I don't mind Microsoft making money. I mind them scott@4th.com | having a bad operating system." -- Linus Torvalds http://4th.com/ | ("The Rebel Code," NY Times, 21 February 1999) | PGP Public Key at http://4th.com/keys/scott.pubkey
Morbus Iff <morbus@disobey.com> Thu, 7 Apr 2005 16:01:26
Now, the tags:
node #1: horses, horseriding, love, hobby node #2: horses, palomino, wishlist node #3: midget, horses, cows
I should look at the code, But are you saying that horses is similar to horseriding, love, hobby, palomino, wishlist, midget, cows ? Perhaps ordered by how often the related terms appear on the same node. This looks like del.icio.us/flickr related terms to me. But somewhere in this thread you said "Explore playing with term_relation so that similar tags are related to one another". In the above example, I don't see how you can draw any conclusions about horses being equivalent to horseriding. I guess I'm puzzled by the algorithm you're using to define what is an alternate term (synonym?) as opposed to being a related term. -- Julian Bond E&MSN: julian_bond at voidstar.com M: +44 (0)77 5907 2173 Webmaster: http://www.ecademy.com/ T: +44 (0)192 0412 433 Personal WebLog: http://www.voidstar.com/ S: callto://julian.bond *** Just Say No To DRM ***
said "Explore playing with term_relation so that similar tags are related to one another". In the above example, I don't see how you can draw any conclusions about horses being equivalent to horseriding.
I guess I'm puzzled by the algorithm you're using to define what is an alternate term (synonym?) as opposed to being a related term.
There is none. My relations only occur at the behest of humans. There is no automated relation creation. -- Morbus Iff ( whooooooo's hoooouuuuuse? ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus
Morbus Iff <morbus@disobey.com> Fri, 8 Apr 2005 07:52:07
said "Explore playing with term_relation so that similar tags are related to one another". In the above example, I don't see how you can draw any conclusions about horses being equivalent to horseriding.
I guess I'm puzzled by the algorithm you're using to define what is an alternate term (synonym?) as opposed to being a related term.
There is none. My relations only occur at the behest of humans. There is no automated relation creation.
ooohh-kaaay. So just so I understand, the user puts in xxx, yyy, zzz as terms. You do a similar_text and metaphone analysis on each and say xxx is 81% like xxxA that somebody else used on another node. Would you like to use that instead? And you're doing this on the fly rather than storing the results anywhere, such as {term_synonym}. Is that right? I have to confess that I find related terms more interesting than trying to get people to tag better. There's been quite a lot of discussion on the Flickr and del.icio.us developer forums about using things like porter-stemming and soundex to converge synonym use and make tagging "better". Consensus so far seems to be that it's not really worth the effort and doesn't necessarily help. But at the same time, the UI at data entry is getting better as we find ways of suggesting appropriate tag/terms. This one is impressive. http://ejohn.org/projects/autodelicious/ Firefox + greasemonkey + Javascript + del.icio.us beta bookmarklet = auto-suggest terms in the edit field using AJAX. Wow! We're all still feeling our way with this stuff. -- Julian Bond E&MSN: julian_bond at voidstar.com M: +44 (0)77 5907 2173 Webmaster: http://www.ecademy.com/ T: +44 (0)192 0412 433 Personal WebLog: http://www.voidstar.com/ S: callto://julian.bond *** Just Say No To DRM ***
Julian Bond <julian_bond@voidstar.com> Fri, 8 Apr 2005 14:11:10
I have to confess that I find related terms more interesting than trying to get people to tag better.
Here's a taster. - Related Terms block - Text term urls - Terms sorted on node displays by vid and then by text The second gif shows what I'm trying to do with data input. One click selection of likely terms. I should stress this is all just interim playing while I let free form tagging in taxonomy settle down. -- Julian Bond E&MSN: julian_bond at voidstar.com M: +44 (0)77 5907 2173 Webmaster: http://www.ecademy.com/ T: +44 (0)192 0412 433 Personal WebLog: http://www.voidstar.com/ S: callto://julian.bond *** Just Say No To DRM ***
Here's a taster.
Some good shots there.
- Related Terms block
I want to do something like this, but as you've addressed, my stuff is currently all on-the-fly. I wonder if there's some way I could use your auto-relation code but ALSO weight human relations more (such that "CAT-5 Cable" would be automatically related to "Cats", but since no human has made the relation, it'd be less powerful than a "tiger"/"cat" relation.)
I should stress this is all just interim playing while I let free form tagging in taxonomy settle down.
I think I heard some furtive whispering that it was supposed to go into core relatively soon, now that HEAD is reopened for development. -- Morbus Iff ( you are nothing without your robot car, NOTHING! ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus
ooohh-kaaay. So just so I understand, the user puts in xxx, yyy, zzz as terms. You do a similar_text and metaphone analysis on each and say xxx is 81% like xxxA that somebody else used on another node. Would you like to use that instead? And you're doing this on the fly rather than storing the results anywhere, such as {term_synonym}.
Is that right?
Close. "instead" is a misnomer, at least in my current UI. They'd be able to use it "instead", but also "in addition to". Scott has been discussing alternatives to this, but I'm waiting on more feedback. This first draft doesn't make any changes to the term_relation or term_synonym tables. But, a future version WILL affect term_relation (mainly because I can ALWAYS assume it's a relation, but I can't ALWAYS assume it's a synonym), but ONLY based on user choice (ie. "user typed in xxx, oh! he chose xxxA inaddition/instead! I'll add that! Tally ho!") as opposed to automated.
I have to confess that I find related terms more interesting than trying to get people to tag better. There's been quite a lot of discussion on the Flickr and del.icio.us developer forums about using things like
Well, from "my" needs, the differences between Flickr and NHPR.org (the people funding my work) is size and audience. Only reporters/admins make tags, and there's a concerted need/want to have strong tags for the NHPR READERS to navigate the site. Delicious and Flickrs don't really have "readers" per se - most of their clientele are all active users cataloging their /own/ content, not passive readers that have /no/ content. The integration of the De/Fl tag system was "for you, the user", whereas NHPR's implementation is really "for us, the reader".
"better". Consensus so far seems to be that it's not really worth the effort and doesn't necessarily help. But at the same time, the UI at
I would agree for an automated based system, yeah. The current homebrew tagging/similar system NHPR is using has been in place for two years now, and there are definitely cases with similar_text and metaphone makes rather comical "mistakes" (from a human's perspective). The mistakes have the same percentages and the same similarities as any correct bit of text, but need that human touch to say "no, no, I /know/ that's not right." I guess my approach is the million monkeys, and you (seem?) to want an automatic typewriter. -- Morbus Iff ( you are nothing without your robot car, NOTHING! ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/disobeycom icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus
participants (3)
-
Julian Bond -
Morbus Iff -
Scott Courtney