On Thu, 12 Mar 2009 20:00:50 +0100 (CET) "Karoly Negyesi" <karoly@negyesi.net> wrote:
Maybe we could let people (other than the maintainers) to "categorise" the project.
Mail, mails, mailing, notify, notifies, notification, subscribe, subscriptions, subscribes and I am sure we will have a few more variations...
test_drupal=# select * from to_tsvector('pg_catalog.english', 'Mail, mails, mailing, notify, notifies, notification, subscribe, subscriptions, subscribes'); to_tsvector ------------------------------------------------------------------ 'mail':1,2,3 'notif':6 'notifi':4,5 'subscrib':7,9 'subscript':8 This could be one solution. This dictionary is not configured to use synonyms, but it's just a matter of configuration for PostgreSQL. There should be precooked solutions for MySQL as well. After all if you're making taxonomies searchable term AND term (term OR term) and (term OR term) you're going to solve this problem anyway. Another solution could be to restrict terms to a selection. Collect terms from the community for let's say one week, filter them, post-edit the result and use it. Auto-completion may help avoiding duplicates. Just most "voted" terms may appear. If you expect high load... the project is a success, still the load of collecting terms should be negligible compared to all the other things required to drupal.org. If you don't expect high load... there is no need to worry about performance of a dictionary. To add tags people will have to login. Load to filter synonyms should be tolerable. If spammers use similar techniques they should have a good ROI. Now we have ~40 categories. Reach 400 and the categorisation system will work much better. BIC [1] should have ~6000 leaves. I could do some research about tools available for mysql. drupal.org recently moved to solr, solr has support for synonyms. [1] http://www.bic.org.uk/ -- Ivan Sergio Borgonovo http://www.webthatworks.it