I am interested. But everything depends on time and ability.
Excellent. For others sitting on the fence, I am not asking for a lot of time or coding (though I will not turn it away), but more for help gathering data. I do not have an active Drupal installation at my finger tips (yet), so I need assistance with data collection.
Jeremy uses a Bayesian classifier in spam.moduel
I'll check it out. Perhaps the classifiers should be factored out into a separate component module. I'll look at the feasibility of that.
Actually, this is why I started doing the relations stuff I'm currently coding. For a primitive, non-learning, feasibility test based on some simple metrics have a look at http://dikini.net/30.11.2005/relations_battle_plan_ii_and_first_results and the similar things block.
Thanks. I'll investigate.
I envision the following phases to this project: I think the plan may be good, but it looks as a very legthy ang very general.
I agree it is long, but I want to be honest about the project. My experience has been that creating a successful machine learning model is a slow, iterative process. One refines the data collection, and then modifies the model, tests, and repeats. Depending on your hunches at the beginning this can be either a quick process or a painfully lengthy one.
What I learned about machine learning and datamining over the years is that they are most successfull, when you have a very well defined target of what do you want to achieve/find.
This is good advice. I hope to go from "I have an itch" to "I have a concrete task on which to concentrate" soon. If anyone has suggestions, I'm all ears. -Mark