Hello, I have an interest in machine learning that I would like to bring to bear on Drupal, and I am hoping to enlist the help of some other people who share this interest - or can help me by providing data. Briefly, machine learning is the algorithmic application of statistical principles. A classic example is the Bayesian spam in your email program/gateway/etc. Based on a learned model, this filter classifies incoming mail as either SPAM or NOT SPAM based on a vector of data drawn from the message. I am looking for interested parties to join me in developing a series of machine learning modules for Drupal. These modules will use data that Drupal can collect to predict outcomes. Examples might include smart "What's Related" type modules, better troll and spam bot protection, better searching, auto categorization, and a wide variety of other predictive tasks. I envision the following phases to this project: 0. Pre-model research: See what data Drupal captures, what other similar modules exist for this purpose, and what work needs to be done to capture the appropriate data. Also: create a wish list of machine learning related tasks to choose from later. 1. Data gathering: Gather information from various sources to use as the basis for generation and testing of models. 2. Model/Tool evaluation: Gather available tools to see if they can be leveraged to generate useful models. Evaluate different modeling algorithms for appropriateness to tasks. 3. Focus research: Pick one task from the wish list on which to concentrate. 4. Initial modeling and testing: Begin creating models to evaluate for suitability to task. 5. Refine data collection: Address shortcomings in the data acquisition to create better models. 6. Test model in live setting: Empirical testing on the model. 7. Module creation: Turn test code into publicly releasable module. 8. Review, evaluate, publish: Gather findings into a document for use by the Drupal and machine learning communities. At this time, I would place myself in the Phase 0 category. I hope to find admins or consultants of admins who run sites that are willing to provide data for this project. User privacy is important, and I am very mindful of your needs in this department. Please do not think this is an attempt at stealing users or personal data. I would also hope the admins would be willing to install custom data gathering modules and help conduct later empirical tests. This has been a long email already, and I don't want to tie up the list with messages relating to my specific pet project. If you are interested in helping me with this project, please email off list. I look forward to creating something really interesting and useful with you, -Mark Fredrickson