[drupal-devel] A Folksonomy module

Wed Mar 9 08:19:46 UTC 2005

On Tue, 2005-03-08 at 17:31 -0500, Morbus Iff wrote:
> > The whole point of folksonomy is that users make up their own tags. 
> Why should there be two unique tags (and thus tids, RSS feeds, etc.)? Is this 
> going to make equivalency code more difficult? 

> Does /no one else/ want any sort of "similar keywords" UI, per the UI at 
> http://disobey.com/d/2005/similar_keywords.jpg? If yes, then would similar 
> keywords be across all folksonomy vocabs, or just an individual user? If 
> there are 1000 users and each user creates 100 terms in their vocab, with 
> a perfect match of (even) 10%, aren't we wasting a lot of resources 
> storing those duplicates? 
Everything depends on the context or meaning the user puts into cats,
dogs, foxes, etc... How are you going to find out if a dog means the
canine friend or ugly woman - both are common use terms and humans might
understand dependent on context. Splitting the different contexts into
different vocabularies is not a panacea either.

The folksonomy and friends approach is betting on an old Marxist dogma
to come true - that if you pile up a lot of quantity it will often
change into a new quality. And there is a point to it. But some work
down the line will be needed to actually get the underlying 'real'
taxonomy behind the folksonomy. 

The taxonomies as they stand are or should be carefully designed,
control vocabularies, the folksonomies represent  a parts of the BIG
vocabulary, yes, with duplicates, synonims, discrepancies, outright
errors but have a strong point on actually capturing individual views
and classification systems of people, and possibly allowing the system
to speculate on behalf of the group.

>   * technorati is creating $copies amount of terms called "funny", as
>...
>   * that my innocent gamegrene site, with two controlled vocabs, is
>...
>   * that the NHPR site, with 7000 unique terms, is going to store
> ...    in your current proposal/design.

You have a point here, but it depends what behaviour you want - early or late 
speculation. Another way is to keep the labels of the taxonomy terms in
a separate table, keep the unique tids for the each user. One kind of
aggregate speculation might be "which tids have the same label". Then
use the answer to get all nodes with these tids. One or two queries,
depending on what do you want to optimise for.

There are others more complex possible, but don't want to get into maths
here. The post is long and boring enough :)

Cheers,
Vlado