[drupal-devel] Taxonomy Similar: Very Early First Draft

Morbus Iff morbus at disobey.com
Fri Apr 8 13:40:40 UTC 2005

> ooohh-kaaay. So just so I understand, the user puts in xxx, yyy, zzz as 
> terms. You do a similar_text and metaphone analysis on each and say xxx 
> is 81% like xxxA that somebody else used on another node. Would you like 
> to use that instead? And you're doing this on the fly rather than 
> storing the results anywhere, such as {term_synonym}.
> Is that right?

Close. "instead" is a misnomer, at least in my current UI. They'd be able 
to use it "instead", but also "in addition to". Scott has been discussing 
alternatives to this, but I'm waiting on more feedback. This first draft 
doesn't make any changes to the term_relation or term_synonym tables.

But, a future version WILL affect term_relation (mainly because I can 
ALWAYS assume it's a relation, but I can't ALWAYS assume it's a synonym), 
but ONLY based on user choice (ie. "user typed in xxx, oh! he chose xxxA 
inaddition/instead! I'll add that! Tally ho!") as opposed to automated.

> I have to confess that I find related terms more interesting than trying 
> to get people to tag better. There's been quite a lot of discussion on 
> the Flickr and del.icio.us developer forums about using things like 

Well, from "my" needs, the differences between Flickr and NHPR.org (the 
people funding my work) is size and audience. Only reporters/admins make 
tags, and there's a concerted need/want to have strong tags for the NHPR 
READERS to navigate the site. Delicious and Flickrs don't really have 
"readers" per se - most of their clientele are all active users cataloging 
their /own/ content, not passive readers that have /no/ content. The 
integration of the De/Fl tag system was "for you, the user", whereas 
NHPR's implementation is really "for us, the reader".

> "better". Consensus so far seems to be that it's not really worth the 
> effort and doesn't necessarily help. But at the same time, the UI at 

I would agree for an automated based system, yeah. The current homebrew 
tagging/similar system NHPR is using has been in place for two years now, 
and there are definitely cases with similar_text and metaphone makes 
rather comical "mistakes" (from a human's perspective). The mistakes have 
the same percentages and the same similarities as any correct bit of text, 
but need that human touch to say "no, no, I /know/ that's not right."

I guess my approach is the million monkeys, and
you (seem?) to want an automatic typewriter.

