Highest Voted 'scalability' Questions - Statistical Analysis Stack Exchange

21

votes

1 answer

How can we simulate from a geometric mixture?

If $f_1,\ldots,f_k$ are known densities from which I can simulate, i.e., for which an algorithm is available. and if the product $$\prod_{i=1}^k f_i(x)^{\alpha_i}\qquad \alpha_1,\ldots,\alpha_k>0$$ is integrable, is there a generic approach to…

asked Mar 18 '16 at 11:32

Xi'an

90,397
9
157
575

6

votes

4 answers

Solving a practical machine learning problem

I am currently doing my Phd in computational biology at Stanford. I get the data I need to answer the questions I am interested in. The data sets are sometimes "large" and these large problems take longer time periods to solve (a couple of days…

machine-learning underdetermined scalability

asked Aug 11 '14 at 06:14

Sid

2,489
10
15

4

votes

2 answers

What are some uses of logistic regression at scale?

Many libraries that scale linear and logistic regression assume a tall-skinny design matrix (many samples, few features), but I don't understand why you would need billions of samples if your data has 250 features. In what scenarios would more data…

generalized-linear-model modeling large-data scalability

asked Jul 30 '21 at 19:14

baffld

195
5

4

votes

0 answers

How does t-SNE slow down with increasing number of dimensions?

I'm trying to understand the computational bounds of t-SNE. It's learned with SGD, so it'll have to go through some number of gradient-descent iterations. We can ignore that here, and focus on the time for each iteration. Barnes-Hut changes it…

data-visualization dimensionality-reduction tsne scalability

asked Jun 02 '15 at 16:04

Leopd

179
5

2

votes

2 answers

Bisecting K-means using Dynamic Time Warping

I'm trying to cluster time series of different length and I came up to an idea to use DTW as a similarity measure, which seems to be adequate, but the thing is, I cannot use it with K-means, since it's hard to define centroids based on time series…

clustering k-means hierarchical-clustering scalability

asked Jan 09 '15 at 11:12

Kobe-Wan Kenobi

2,437
3
20
33

1

vote

0 answers

handling multiple time series through common model?

I have 1.5 lac/ 150 K timeseries . These are divided by geo locations. I have total 32 geo locations.Customer is expecting to have minimum number of model for all the 1.5 lac forecasting. How should i cluster my time series in such scenario ? DTW/…

time-series clustering arima predictive-models scalability

asked Jul 03 '20 at 19:16

Arpit Sisodia

1,029
2
7
23

1

vote

1 answer

Bayes and Naive Bayes code implementations

I know that Bayes classifier assigns the new data point $\pmb{x}$ to the class $\omega_j, \ j=1,\dots,M$, when $p(\omega_j \mid \pmb{x}) = \max_{q=1,\dots,M}p(\omega_q \mid \pmb{x})$, where $p(\omega_j\mid \pmb{x}) = \frac{p(\pmb{x}\mid…

machine-learning classification naive-bayes scalability

asked Oct 13 '19 at 17:50

tgeorgiop

23
3

1

vote

2 answers

Best Scalable Classification Algorithms

I have a very large data set that I want to perform classification tasks on. There are about 40 million instances, 16 features, and 2 classes. I'm attempting to use SciKit-learn LinearSVC and LogisticRegression, but after several hours the…

classification scalability

asked Mar 17 '16 at 08:02

MVTC

113
6

0

votes

1 answer

Persistent Cluster ID's for DBSCAN

When executing the DBSCAN algorithm over multiple runs on similar data (but not the same), I would like to generate persistent ID's so we can monitor how the clusters changed over time. Selection of another algorithm is not possible. This question…

clustering k-means percentage dbscan scalability

asked Oct 17 '17 at 02:17

John Zhu

1
2

0

votes

1 answer

Scalable machine learning for bigger data

I am aware of the theory of stochastic gradient descent, which is a faster way of developing linear regression. Through this we can have an 'optimized implementation' of linear regression. There are similar techniques for non-parametric methods as…

r machine-learning self-study python scalability

asked Sep 08 '16 at 01:29

StatguyUser

874
3
9
27

0

votes

1 answer

Scalability comparison with the help of regression

I created an algorithm and I tested it against a current algorithm. The results are in this form: Power Processes Method Time(s) 1 3 1 19,94 1 4 1 20,04 1 5 1 20,06 1 6 …

r regression algorithms computational-statistics scalability

asked Nov 17 '15 at 00:48

user2524707

121
1

Questions tagged [scalability]