[development] Scratching an itch: Machine Learning

vlado vlado at dikini.net
Tue Dec 6 09:32:24 UTC 2005


On Mon, 2005-12-05 at 13:11 -0600, Mark Fredrickson wrote:
> Hello,
> 
> I have an interest in machine learning that I would like to bring to bear on
> Drupal, and I am hoping to enlist the help of some other people who share
> this interest - or can help me by providing data.
I am interested. But everything depends on time and ability.

> Briefly, machine learning is the algorithmic application of statistical
> principles. A classic example is the Bayesian spam in your email
> program/gateway/etc. Based on a learned model, this filter classifies
> incoming mail as either SPAM or NOT SPAM based on a vector of data drawn
> from the message.
Jeremy uses a Bayesian classifier in spam.moduel

> I am looking for interested parties to join me in developing a series of
> machine learning modules for Drupal. These modules will use data that Drupal
> can collect to predict outcomes. Examples might include smart "What's
> Related" type modules, better troll and spam bot protection, better
> searching, auto categorization, and a wide variety of other predictive
> tasks.
Actually, this is why I started doing the relations stuff I'm currently coding.
For a primitive, non-learning, feasibility test based on some simple
metrics have a look at
http://dikini.net/30.11.2005/relations_battle_plan_ii_and_first_results
and the similar things block.

> I envision the following phases to this project:
I think the plan may be good, but it looks as a very legthy ang very
general. 

What I learned about machine learning and datamining over the years is
that they are most successfull, when you have a very well defined target
of what do you want to achieve/find. With drupal, we have a multitude of
applications, a zillion data-models, and infinite number of "this thing
is in my head, but I'll do it" todos. Having a generic catch-all module
is going to fail badly. What might be useful is a framework of basic
methods - bayesian learner, rule based learner, etc... which, can be
used in concrete applications, but if not used - it is a waste. Or going
the evolutionary approach is pick a target, adaptive behaviour for
example, so a website adapts to the user preferences and the current
trends and presents the most relevant and up to date information.

It's a good idea overall.

And good luck.

Cheers,
Vlado



More information about the development mailing list