Hi all, For my Machine Learning toolkit, I'm looking to draw on the work of either the Java Weka ML system or the MLC++ library. My question to the Drupal community is: which is easier to bridge (Java or C++) to PHP (and specifically to Drupal)? My vote is for Java, as I can distribute the (GPL'd) bytecode more easily than pre-compiled C++ binaries. Let's take performance off the table for this discussion. My design calls for the ML algorithms to be pluggable via an API to the interaction code, so high perf classifiers could be written later if my selected method proves slow. On a more general note, do any modules currently require php extensions that are not in PEAR? If so, how do they handle getting the module compiled and installed. Thanks, Mark Fredrickson
My question to the Drupal community is: which is easier to bridge (Java or C++) to PHP (and specifically to Drupal)?
Writng a PHP extension is probably easier in C. Insofar, I have not seen any Java extension.
My vote is for Java, as I can distribute the (GPL'd) bytecode more easily than pre-compiled C++ binaries.
and you make Java mandatory on the server. That's not a good idea.
On a more general note, do any modules currently require php extensions that are not in PEAR? If so, how do they handle getting the module compiled and installed.
Very few if any module require any PHP extension which is not in the PHP core. Some reuqire a few scripts that are in PEAR like simpletest, but that's rare. But scripts are not extensions. Regards NK
to PHP (and specifically to Drupal)?
Writng a PHP extension is probably easier in C. Insofar, I have not seen any Java extension.
http://php-java-bridge.sourceforge.net/ And there is also a deprecated ext/java extension for PHP <= 4.3 These also have the advantage of _already_ being a PHP extension to use existing Java bytecode. Instead of a necessitating me to write an extension, I can just glue the bridge to the Weka toolkit in PHP.
and you make Java mandatory on the server. That's not a good idea.
What is more likely: a hosting service providing a JVM or providing a C/C++ compiler? For what it is worth, my host (NearlyFreeSpeech.Net) has gcc but not javac (or at least I couldn't find it in 5 minutes of looking - I haven't asked tech support yet). I don't know if this typical.
Very few if any module require any PHP extension which is not in the PHP core. Some reuqire a few scripts that are in PEAR like simpletest, but that's rare. But scripts are not extensions.
Well, until someone codes a ML library in PHP I do not see much of an option. Given the choice between producing a working (though difficult to install) module or spending a lot of effort on re-inventing the wheel, I'll choose the former. :-) -M
One option to consider is making the ML code run external to drupal. Maybe call php from the command line to invoke enough of drupal for node loading to feed you ML algorithms, and write the results back to the database... So you can setup a drupal module to configure your ML algorithms and to deal with the data it writes back to the database for use in the CMS.... It would basically be a a middleware module to interface with an external app, plus the external app... might not be so joe-blow ISP friendly, but I think people using machine learning with their drupal site probably want a little more control over their application environment than the typical shared hosting account provides. On Wed, 2005-12-28 at 16:36 -0600, Mark Fredrickson wrote:
to PHP (and specifically to Drupal)?
Writng a PHP extension is probably easier in C. Insofar, I have not seen any Java extension.
http://php-java-bridge.sourceforge.net/
And there is also a deprecated ext/java extension for PHP <= 4.3
These also have the advantage of _already_ being a PHP extension to use existing Java bytecode. Instead of a necessitating me to write an extension, I can just glue the bridge to the Weka toolkit in PHP.
and you make Java mandatory on the server. That's not a good idea.
What is more likely: a hosting service providing a JVM or providing a C/C++ compiler?
For what it is worth, my host (NearlyFreeSpeech.Net) has gcc but not javac (or at least I couldn't find it in 5 minutes of looking - I haven't asked tech support yet). I don't know if this typical.
Very few if any module require any PHP extension which is not in the PHP core. Some reuqire a few scripts that are in PEAR like simpletest, but that's rare. But scripts are not extensions.
Well, until someone codes a ML library in PHP I do not see much of an option. Given the choice between producing a working (though difficult to install) module or spending a lot of effort on re-inventing the wheel, I'll choose the former.
:-)
-M
Hi, Java has already been done, take a look at http://php.net/java Gordon. On Wed, 2005-12-28 at 10:40 -0600, Mark Fredrickson wrote:
Hi all,
For my Machine Learning toolkit, I'm looking to draw on the work of either the Java Weka ML system or the MLC++ library.
My question to the Drupal community is: which is easier to bridge (Java or C++) to PHP (and specifically to Drupal)?
My vote is for Java, as I can distribute the (GPL'd) bytecode more easily than pre-compiled C++ binaries.
Let's take performance off the table for this discussion. My design calls for the ML algorithms to be pluggable via an API to the interaction code, so high perf classifiers could be written later if my selected method proves slow.
On a more general note, do any modules currently require php extensions that are not in PEAR? If so, how do they handle getting the module compiled and installed.
Thanks,
Mark Fredrickson
!DSPAM:1000,43b2c2e9305472601091748!
Hi, Also I found http://php-java-bridge.sourceforge.net/ which I think is a fork of the java extension. It also allows you to link into mono. (Thats what the description says). Gordon. On Thu, 2005-12-29 at 09:36 +1100, Gordon Heydon wrote:
Hi,
Java has already been done, take a look at http://php.net/java
Gordon.
On Wed, 2005-12-28 at 10:40 -0600, Mark Fredrickson wrote:
Hi all,
For my Machine Learning toolkit, I'm looking to draw on the work of either the Java Weka ML system or the MLC++ library.
My question to the Drupal community is: which is easier to bridge (Java or C++) to PHP (and specifically to Drupal)?
My vote is for Java, as I can distribute the (GPL'd) bytecode more easily than pre-compiled C++ binaries.
Let's take performance off the table for this discussion. My design calls for the ML algorithms to be pluggable via an API to the interaction code, so high perf classifiers could be written later if my selected method proves slow.
On a more general note, do any modules currently require php extensions that are not in PEAR? If so, how do they handle getting the module compiled and installed.
Thanks,
Mark Fredrickson
!DSPAM:1000,43b318a110974455618576!
On Wed, 2005-12-28 at 10:40 -0600, Mark Fredrickson wrote:
Hi all,
For my Machine Learning toolkit, I'm looking to draw on the work of either the Java Weka ML system or the MLC++ library.
Trying to integrate either Java or a C++ library into PHP just so that one can run a Java or C++ machine learning system sounds really painful. Unless the performance needs to be high and the transaction latency extremely low, this sounds like a perfect case for NOT integrating the code, but calling it via some sort of remote procedure call (RPC) or remote method instantiation (RMI -- Javaspeak for a previously existing concept -- grrrr >:-( ). How about writing a Java server that does the machine language stuff that can accept an XML-RPC socket connection from Drupal instead? That's got to be a thousand times easier to do, and the performance won't be that bad. I've already done this kind of thing twice. In one instance I open a socket to a Java server to do "real time" transactions with it from a Drupal module, and in another, I make a SOAP connection to a C++ server from a Drupal module, likewise to do "real time" transactions. I would not even have a prototype working if I tried to write a PHP extension that tried to run the Java or C++ application natively. ..chrisxj
On 28-Dec-05, at 4:05 PM, Chris Johnson wrote:
On Wed, 2005-12-28 at 10:40 -0600, Mark Fredrickson wrote:
Hi all,
For my Machine Learning toolkit, I'm looking to draw on the work of either the Java Weka ML system or the MLC++ library.
Trying to integrate either Java or a C++ library into PHP just so that one can run a Java or C++ machine learning system sounds really painful. Unless the performance needs to be high and the transaction latency extremely low, this sounds like a perfect case for NOT integrating the code, but calling it via some sort of remote procedure call (RPC) or remote method instantiation (RMI -- Javaspeak for a previously existing concept -- grrrr >:-( ).
How about writing a Java server that does the machine language stuff that can accept an XML-RPC socket connection from Drupal instead? That's got to be a thousand times easier to do, and the performance won't be that bad.
Yep. And, of course, you can do fun stuff like expose what kind of inputs/outputs you are willing to accept, and potentially have different server implementations. Actually....thinking about it a bit more, and you have a distributed grid of machine learning, with Drupal slurping in results and spitting them back out again. Fun! -- Boris Mann Vancouver 778-896-2747 San Francisco 415-367-3595 SKYPE borismann http://www.bryght.com
For my Machine Learning toolkit, I'm looking to draw on the work of either the Java Weka ML system or the MLC++ library. If you know Weka, then the best thing (IMO) is to use web services, rather than doing some php to java native integration. It is less esoteric.
check out triana http://www.trianacode.org (it uses weka inside, I think) and maybe http://grid.deis.unical.it/weka4ws/ Another option would be to do a simple restful wrapping of weka code via servlets. If you are into python, orange ( http://www.ailab.si/orange ) is a very good thing. That + twisted can be a killer web datamining services environment.
My question to the Drupal community is: which is easier to bridge (Java or C++) to PHP (and specifically to Drupal)? C++, as pecl extensions.
My vote is for Java, as I can distribute the (GPL'd) bytecode more easily than pre-compiled C++ binaries. You can always submit them to php.net
Let's take performance off the table for this discussion. My design calls for the ML algorithms to be pluggable via an API to the interaction code, so high perf classifiers could be written later if my selected method proves slow.
For my Machine Learning toolkit, I'm looking to draw on the work of either the Java Weka ML system or the MLC++ library. If you know Weka, then the best thing (IMO) is to use web services, rather than doing some php to java native integration. It is less esoteric.
This is the conclusion that I came to as well. It should be (relatively) easy to wrap WEKA up in the Apache XML-RPC Java package (http://ws.apache.org/xmlrpc/) and serve it either independently or via Tomcat or another servlet framework. If anyone is willing to be a source for answering my inevitable questions about writing servlets, please contact me off-list. I've written a fair amount of Java code, but I've never done any servlets specifically. Thanks, -Mark
This is the conclusion that I came to as well. It should be (relatively) easy to wrap WEKA up in the Apache XML-RPC Java package (http://ws.apache.org/xmlrpc/) and serve it either independently or via Tomcat or another servlet framework.
weka4ws already implements that ( I think, I'm not sure, not a big Java man myself ) Hope this helps
participants (7)
-
Boris Mann -
Chris Johnson -
Darrel O'Pry -
Gordon Heydon -
Karoly Negyesi -
Mark Fredrickson -
vlado