8

Python is plenty of ML libraries (like the great scikit-learn). Is there any good for java/scala, containing many algos (regression, classification, clustering, cross-validation, feature processing), stable & maintained and able to deal with massive dataset?

I've just found Mahout, Breeze/Nak, and Weka, but they're not looking as great as Python ones.

Additionally, if there's no equivalent, how can I efficiently connect java code with Python?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
boskaiolo
  • 81
  • 1
  • 1
  • 5

3 Answers3

16

You may find helpful this extensive curated list of ML libraries, frameworks and software tools. In particular, it contains resources that you're looking for - ML lists for Java and for Scala.

Aleksandr Blekh
  • 7,867
  • 2
  • 27
  • 93
5

Apache Spark and specifically its component MLlib looks like exactly what you are looking for. MLlib contains implementations for classification, regression, dimensionality reduction etc. You can program in Scala,Java and Python.

Its basically a very fast distributed computing framework that can be run in an Hadoop cluster. For development purposes, you can easily run it in standalone mode (without Hadoop) on your local machine too.

Check out the MLlib guide here : https://spark.apache.org/docs/latest/mllib-guide.html

Suvir
  • 51
  • 1
  • 3
1

Hava a look at JavaML (http://java-ml.sourceforge.net/) and Encog (http://www.heatonresearch.com/encog). The latter focuses rather on Neural Networks than on many algorithms.

Also, weka might not have very friendly java API (because, first of all, it's a GUI application, not a library), but when you get used to it, you start appreciating how many things are implemented there.

I have used successfully all of them.

Alexey Grigorev
  • 8,147
  • 3
  • 26
  • 39