Java is an object-oriented language and runtime environment (JRE). Java programs are compiled to bytecode and run in a virtual machine (JVM).
Questions tagged [java]
91 questions
15
votes
5 answers
Open source Java library for statistics at the level offered by a graduate statistics course
I am taking a graduate course in Applied Statistics that uses the following textbook (to give you a feel for the level of the material being covered): Statistical Concepts and Methods, by G. K. Bhattacharyya and R. A. Johnson.
The Professor requires…

user1172468
- 1,505
- 5
- 21
- 36
15
votes
4 answers
Smoothing time series data
I am building an android application that records accelerometer data during sleep, so as to analyze sleep trends and optionally wake the user near a desired time during light sleep.
I have already built the component that collects and stores data,…

Jon
- 253
- 2
- 7
11
votes
3 answers
PCA too slow when both n,p are large: Alternatives?
Problem Setup
I have data points (images) of high dimension (4096), which I'm trying to visualize in 2D. To this end, I'm using t-sne in a manner similar to the following example code by Karpathy.
The scikit-learn documentation recommends using PCA…

galoosh33
- 2,202
- 13
- 20
10
votes
2 answers
How can I determine weibull parameters from data?
I have a histogram of wind speed data which is often represented using a weibull distribution. I would like to calculate the weibull shape and scale factors which give the best fit to the histogram.
I need a numerical solution (as opposed to graphic…

klonq
- 1,167
- 2
- 9
- 9
9
votes
1 answer
How to use/interpret empirical distribution?
First of all I'd like to apologize for the vague title, I couldn't really formulate a better one just now, please feel free to change, or advice me to change, the title to make it better fit the core of the question.
Now about the question itself, I…

posdef
- 739
- 8
- 24
9
votes
1 answer
Is Xorshift RNG good enough for Monte Carlo approaches? If not what alternatives are there?
I recently stumbled across an article on pseudorandom numbers in Java which mention potential weaknesses in the default algorithm, called linear congruential generator (LCG), and gives some alternatives. Among those I find Xorshift generators…

posdef
- 739
- 8
- 24
8
votes
6 answers
What programming language for statistical inference?
just for curiosity...
What language is used most here?
R? MATLAB? Python? Java?
What for prototype or for production?
For example I think MATLAB is mostly used for prototyping, python for both prot. and production...

nkint
- 768
- 3
- 9
- 20
8
votes
3 answers
Complete machine learning library for Java/Scala
Python is plenty of ML libraries (like the great scikit-learn). Is there any good for java/scala, containing many algos (regression, classification, clustering, cross-validation, feature processing), stable & maintained and able to deal with massive…

boskaiolo
- 81
- 1
- 1
- 5
6
votes
2 answers
Unsupervised outlier detection in 2D space
Problem
I'm working on a school project in Java and my goal is to detect and remove outliers from a dataset containin geo points.
The final result should be a single cluster, with any shape, containing all the points inside a real area (like a…

StepTNT
- 171
- 5
6
votes
3 answers
Data mining classification competition
I'm currently taking a data mining class, and for one our projects we're required to predict the class label for an unknown data set by first building a classifier on a training data set which already provides the class label.
We're only required…
LearnHK
5
votes
3 answers
Implementation of online classification algorithms?
I'm looking for implementations of online learners. I guess that is possible with AdaBoost. Where you train the model and then you modify it by adding later more training data. However you don't have to re-train the entire model. Are you aware of…

Jack Twain
- 7,781
- 14
- 48
- 74
5
votes
2 answers
Parameter estimation for normal distribution in Java
Given a set of data (~5000 values) I'd like to draw random samples from the same distribution as the original data. The problem is there is no way to know for sure what distribution the original data comes from.
It makes sense to use normal…

posdef
- 739
- 8
- 24
5
votes
1 answer
Verifying the output of implementing internal clustering validity indexes
I have implemented some internal clustering validity indexes in Java:
Simplified Silhouette.
Calinski-Harabasz (VRC).
Davies -Bouldin.
Dunn's Index.
How could I verify if my implementation is correct?
I have tested the indexes on Iris, Wine,…

ML_TN
- 71
- 4
5
votes
2 answers
Looking for a test for shape comparison
I have two different time series both length = 100 and I need to know what is the best test (non-parametric, if possible), that return how much these two series are same or similar shapes.
Here are two examples: first one the two series are very…

Alberto acepsut
- 161
- 7
5
votes
3 answers
Java implementations of the lasso
Are there any open-source Java implementations of lasso or least angles regression?
Pure Java code would be best, but clean implementations in other languages would also be of interest. I am already aware of the existence of a variety of R packages…

NPE
- 5,351
- 5
- 33
- 44