Highest Voted Questions - Statistical Analysis Stack Exchange

43

votes

5 answers

Fake uniform random numbers: More evenly distributed than true uniform data

I'm looking for a way to generate random numbers that appear to be uniform distributed -- and every test will show them to be uniform -- except that they are more evenly distributed than true uniform data. The problem I have with the "true" uniform…

distributions random-generation uniform-distribution quasi-monte-carlo

asked Oct 14 '12 at 15:47

Has QUIT--Anony-Mousse

39,639
7
61
96

43

votes

4 answers

Is it possible to give variable sized images as input to a convolutional neural network?

Can we give images with variable size as input to a convolutional neural network for object detection? If possible, how can we do that? But if we try to crop the image, we will be loosing some portion of the image and if we try to resize, then, the…

neural-networks tensorflow keras computer-vision object-detection

asked Jan 24 '19 at 04:03

Ashna Eldho

531
1
4
4

43

votes

8 answers

How do I get people to take better care of data?

My workplace has employees from a very wide range of disciplines, so we generate data in lots of different forms. Consequently, each team has developed its own system for storing data. Some use Access or SQL databases; some teams (to my horror)…

dataset reproducible-research quality-control

asked Oct 21 '10 at 16:26

Richie Cotton

644
9
15

43

votes

6 answers

How to quasi match two vectors of strings (in R)?

I am not sure how this should be termed, so please correct me if you know a better term. I've got two lists. One of 55 items (e.g: a vector of strings), the other of 92. The item names are similar but not identical. I wish to find the best…

r text-mining

asked Oct 08 '10 at 21:31

Tal Galili

19,935
32
133
195

43

votes

6 answers

Why do I get a 100% accuracy decision tree?

I'm getting a 100% accuracy for my decision tree. What am I doing wrong? This is my code: import pandas as pd import json import numpy as np import sklearn import matplotlib.pyplot as plt data =…

machine-learning python cart accuracy

asked Mar 22 '18 at 11:54

Nadjla

441
1
4
4

43

votes

2 answers

If only prediction is of interest, why use lasso over ridge?

On page 223 in An Introduction to Statistical Learning, the authors summarise the differences between ridge regression and lasso. They provide an example (Figure 6.9) of when "lasso tends to outperform ridge regression in terms of bias, variance,…

machine-learning prediction lasso regularization ridge-regression

asked Mar 05 '18 at 10:19

Oliver Angelil

1,129
1
11
24

43

votes

2 answers

Who invented stochastic gradient descent?

I'm trying to understand the history of Gradient descent and Stochastic gradient descent. Gradient descent was invented in Cauchy in 1847.Méthode générale pour la résolution des systèmes d'équations simultanées. pp. 536–538 For more information…

references gradient-descent history stochastic-gradient-descent

asked Nov 14 '17 at 13:49

DaL

4,462
3
16
27

43

votes

5 answers

How to perform two-sample t-tests in R by inputting sample statistics rather than the raw data?

Let's say we have the statistics given below gender mean sd n f 1.666667 0.5773503 3 m 4.500000 0.5773503 4 How do you perform a two-sample t-test (to see if there is a significant difference between the means of men and women in some variable)…

r t-test

asked Jun 13 '12 at 16:15

Alby

2,103
3
19
22

43

votes

9 answers

Correlation does not imply causation; but what about when one of the variables is time?

I know this question has been asked a billion times, so, after looking online, I am fully convinced that correlation between 2 variables does not imply causation. In one of my stats lectures today, we had a guest lecture from a physicist, on the…

correlation mathematical-statistics causality

asked Jun 07 '17 at 17:00

Thomas Moore

1,375
10
17

43

votes

3 answers

Variance of $K$-fold cross-validation estimates as $f(K)$: what is the role of "stability"?

TL,DR: It appears that, contrary to oft-repeated advice, leave-one-out cross validation (LOO-CV) -- that is, $K$-fold CV with $K$ (the number of folds) equal to $N$ (the number of training observations) -- yields estimates of the generalization…

regression machine-learning variance cross-validation predictive-models

asked May 20 '17 at 01:11

Jake Westfall

11,539
2
48
96

43

votes

10 answers

How to efficiently generate random positive-semidefinite correlation matrices?

I would like to be able to efficiently generate positive-semidefinite (PSD) correlation matrices. My method slows down dramatically as I increase the size of matrices to be generated. Could you suggest any efficient solutions? If you are aware of…

random-generation correlation-matrix

asked Sep 16 '10 at 20:39

Eduardas

2,239
4
23
22

43

votes

4 answers

When should I balance classes in a training data set?

I had an online course, where I learned, that unbalanced classes in the training data might lead to problems, because classification algorithms go for the majority rule, as it gives good results if the unbalance is too much. In an assignment one had…

machine-learning classification unbalanced-classes

asked Aug 03 '16 at 14:59

Zelphir Kaltstahl

613
1
7
10

43

votes

6 answers

Neural network references (textbooks, online courses) for beginners

I want to learn Neural Networks. I am a Computational Linguist. I know statistical machine learning approaches and can code in Python. I am looking to start with its concepts, and know one or two popular models which may be useful from a…

neural-networks deep-learning references natural-language computer-vision

asked Aug 02 '16 at 16:35

HIGGINS

479
8
12

43

votes

2 answers

Area under Precision-Recall Curve (AUC of PR-curve) and Average Precision (AP)

Is Average Precision (AP) the Area under Precision-Recall Curve (AUC of PR-curve) ? EDIT: here is some comment about difference in PR AUC and AP. The AUC is obtained by trapezoidal interpolation of the precision. An alternative and usually…

scikit-learn precision-recall auc average-precision

asked Jun 15 '15 at 09:37

mrgloom

1,687
4
25
33

43

votes

4 answers

How do you use the 'test' dataset after cross-validation?

In some lectures and tutorials I've seen, they suggest to split your data into three parts: training, validation and test. But it is not clear how the test dataset should be used, nor how this approach is better than cross-validation over the whole…

machine-learning cross-validation validation

asked May 18 '15 at 21:02

Serhiy

959
1
8
11

Most Popular