For my Machine Learning toolkit, I'm looking to draw on the work of either the Java Weka ML system or the MLC++ library. If you know Weka, then the best thing (IMO) is to use web services, rather than doing some php to java native integration. It is less esoteric.
check out triana http://www.trianacode.org (it uses weka inside, I think) and maybe http://grid.deis.unical.it/weka4ws/ Another option would be to do a simple restful wrapping of weka code via servlets. If you are into python, orange ( http://www.ailab.si/orange ) is a very good thing. That + twisted can be a killer web datamining services environment.
My question to the Drupal community is: which is easier to bridge (Java or C++) to PHP (and specifically to Drupal)? C++, as pecl extensions.
My vote is for Java, as I can distribute the (GPL'd) bytecode more easily than pre-compiled C++ binaries. You can always submit them to php.net
Let's take performance off the table for this discussion. My design calls for the ML algorithms to be pluggable via an API to the interaction code, so high perf classifiers could be written later if my selected method proves slow.