Questions tagged [scalability]
11 questions
21
votes
1 answer
How can we simulate from a geometric mixture?
If $f_1,\ldots,f_k$ are known densities from which I can simulate, i.e., for which an algorithm is available. and if the product $$\prod_{i=1}^k f_i(x)^{\alpha_i}\qquad \alpha_1,\ldots,\alpha_k>0$$ is integrable, is there a generic approach to…

Xi'an
- 90,397
- 9
- 157
- 575
6
votes
4 answers
Solving a practical machine learning problem
I am currently doing my Phd in computational biology at Stanford. I get the data I need to answer the questions I am interested in. The data sets are sometimes "large" and these large problems take longer time periods to solve (a couple of days…

Sid
- 2,489
- 10
- 15
4
votes
2 answers
What are some uses of logistic regression at scale?
Many libraries that scale linear and logistic regression assume a tall-skinny design matrix (many samples, few features), but I don't understand why you would need billions of samples if your data has 250 features.
In what scenarios would more data…

baffld
- 195
- 5
4
votes
0 answers
How does t-SNE slow down with increasing number of dimensions?
I'm trying to understand the computational bounds of t-SNE. It's learned with SGD, so it'll have to go through some number of gradient-descent iterations. We can ignore that here, and focus on the time for each iteration. Barnes-Hut changes it…

Leopd
- 179
- 5
2
votes
2 answers
Bisecting K-means using Dynamic Time Warping
I'm trying to cluster time series of different length and I came up to an idea to use DTW as a similarity measure, which seems to be adequate, but the thing is, I cannot use it with K-means, since it's hard to define centroids based on time series…

Kobe-Wan Kenobi
- 2,437
- 3
- 20
- 33
1
vote
0 answers
handling multiple time series through common model?
I have 1.5 lac/ 150 K timeseries . These are divided by geo locations. I have total 32 geo locations.Customer is expecting to have minimum number of model for all the 1.5 lac forecasting. How should i cluster my time series in such scenario ?
DTW/…

Arpit Sisodia
- 1,029
- 2
- 7
- 23
1
vote
1 answer
Bayes and Naive Bayes code implementations
I know that Bayes classifier assigns the new data point $\pmb{x}$ to the class $\omega_j, \ j=1,\dots,M$, when
$p(\omega_j \mid \pmb{x}) = \max_{q=1,\dots,M}p(\omega_q \mid \pmb{x})$,
where
$p(\omega_j\mid \pmb{x}) = \frac{p(\pmb{x}\mid…

tgeorgiop
- 23
- 3
1
vote
2 answers
Best Scalable Classification Algorithms
I have a very large data set that I want to perform classification tasks on. There are about 40 million instances, 16 features, and 2 classes.
I'm attempting to use SciKit-learn LinearSVC and LogisticRegression, but after several hours the…

MVTC
- 113
- 6
0
votes
1 answer
Persistent Cluster ID's for DBSCAN
When executing the DBSCAN algorithm over multiple runs on similar data (but not the same), I would like to generate persistent ID's so we can monitor how the clusters changed over time.
Selection of another algorithm is not possible. This question…

John Zhu
- 1
- 2
0
votes
1 answer
Scalable machine learning for bigger data
I am aware of the theory of stochastic gradient descent, which is a faster way of developing linear regression. Through this we can have an 'optimized implementation' of linear regression. There are similar techniques for non-parametric methods as…

StatguyUser
- 874
- 3
- 9
- 27
0
votes
1 answer
Scalability comparison with the help of regression
I created an algorithm and I tested it against a current algorithm.
The results are in this form:
Power Processes Method Time(s)
1 3 1 19,94
1 4 1 20,04
1 5 1 20,06
1 6 …

user2524707
- 121
- 1