Highest Voted Questions - Statistical Analysis Stack Exchange

61

votes

6 answers

Is the "hybrid" between Fisher and Neyman-Pearson approaches to statistical testing really an "incoherent mishmash"?

There exists a certain school of thought according to which the most widespread approach to statistical testing is a "hybrid" between two approaches: that of Fisher and that of Neyman-Pearson; these two approaches, the claim goes, are "incompatible"…

hypothesis-testing statistical-significance p-value type-i-and-ii-errors history

asked Aug 21 '14 at 12:54

amoeba

93,463
28
275
317

60

votes

3 answers

Testing equality of coefficients from two different regressions

This seems to be a basic issue, but I just realized that I actually don't know how to test equality of coefficients from two different regressions. Can anyone shed some light on this? More formally, suppose I ran the following two regressions:…

hypothesis-testing inference

asked Apr 12 '14 at 12:51

coffeinjunky

1,646
1
16
22

60

votes

5 answers

Is it important to scale data before clustering?

I found this tutorial, which suggests that you should run the scale function on features before clustering (I believe that it converts data to z-scores). I'm wondering whether that is necessary. I'm asking mostly because there's a nice elbow point…

clustering k-means

asked Mar 12 '14 at 21:27

Jeremy

1,259
3
12
17

60

votes

5 answers

How to calculate pseudo-$R^2$ from R's logistic regression?

Christopher Manning's writeup on logistic regression in R shows a logistic regression in R as follows: ced.logr <- glm(ced.del ~ cat + follows + factor(class), family=binomial) Some output: > summary(ced.logr) Call: glm(formula = ced.del ~ cat +…

r logistic likelihood pseudo-r-squared

asked Mar 19 '11 at 22:44

dfrankow

2,816
6
30
39

60

votes

5 answers

Is every covariance matrix positive definite?

I guess the answer should be yes, but I still feel something is not right. There should be some general results in the literature, could anyone help me?

covariance matrix covariance-matrix linear-algebra

asked Apr 22 '13 at 08:48

Jingjings

1,173
1
9
13

60

votes

46 answers

Most famous statisticians

What are the most important statisticians, and what is it that made them famous? (Reply just one scientist per answer please.)

methodology history

asked Dec 04 '10 at 00:08

mariana soffer

1,091
2
15
18

60

votes

3 answers

Linear model with log-transformed response vs. generalized linear model with log link

In this paper titled "CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA" the authors write: In a generalized linear model, the mean is transformed, by the link function, instead of transforming the response itself. The two methods …

generalized-linear-model model-selection lognormal-distribution

asked Jan 16 '13 at 10:01

miura

3,364
3
21
27

60

votes

1 answer

Logistic regression in R resulted in perfect separation (Hauck-Donner phenomenon). Now what?

I'm trying to predict a binary outcome using 50 continuous explanatory variables (the range of most of the variables is $-\infty$ to $\infty$). My data set has almost 24,000 rows. When I run glm in R, I get: Warning messages: 1: glm.fit: algorithm…

r regression logistic separation

asked Dec 12 '12 at 23:59

Dcook

733
1
7
8

60

votes

5 answers

How does one interpret SVM feature weights?

I am trying to interpret the variable weights given by fitting a linear SVM. (I'm using scikit-learn): from sklearn import svm svm = svm.SVC(kernel='linear') svm.fit(features, labels) svm.coef_ I cannot find anything in the documentation that…

svm feature-selection python scikit-learn

asked Oct 11 '12 at 20:48

Austin Richardson

928
1
8
10

60

votes

4 answers

How to generate correlated random numbers (given means, variances and degree of correlation)?

I'm sorry if this seems a bit too basic, but I guess I'm just looking to confirm understanding here. I get the sense I'd have to do this in two steps, and I've started trying to grok correlation matrices, but it's just starting to seem really…

probability correlation conditional-probability random-generation

asked Oct 07 '12 at 19:45

Joseph Weissman

703
1
6
7

60

votes

7 answers

Why is the regularization term added to the cost function (instead of multiplied etc.)?

Whenever regularization is used, it is often added onto the cost function such as in the following cost function. $$ J(\theta)=\frac 1 2(y-\theta X^T)(y-\theta X^T)^T+\alpha\|\theta\|_2^2 $$ This makes intuitive sense to me since minimize the cost…

regularization

asked May 22 '18 at 09:48

grenmester

725
1
6
5

60

votes

5 answers

Is adjusting p-values in a multiple regression for multiple comparisons a good idea?

Lets assume you are a social science researcher/econometrician trying to find relevant predictors of demand for a service. You have 2 outcome/dependent variables describing the demand (using the service yes/no, and the number of occasions). You have…

regression multivariate-analysis predictive-models multiple-regression multiple-comparisons

asked Sep 30 '10 at 14:07

Mikael M

703
1
6
6

60

votes

5 answers

What is the advantages of Wasserstein metric compared to Kullback-Leibler divergence?

What is the practical difference between Wasserstein metric and Kullback-Leibler divergence? Wasserstein metric is also referred to as Earth mover's distance. From Wikipedia: Wasserstein (or Vaserstein) metric is a distance function defined between…

distributions kullback-leibler metric wasserstein

asked Aug 01 '17 at 13:54

Thomas Fauskanger

703
1
6
5

60

votes

2 answers

What is the relationship between a chi squared test and test of equal proportions?

Suppose that I have three populations with four, mutually exclusive characteristics. I take random samples from each population and construct a crosstab or frequency table for the characteristics that I am measuring. Am I correct in saying…

chi-squared-test proportion contingency-tables z-test

asked Sep 05 '10 at 16:35

hgcrpd

1,307
2
11
13

60

votes

5 answers

Backpropagation with Softmax / Cross Entropy

I'm trying to understand how backpropagation works for a softmax/cross-entropy output layer. The cross entropy error function is $$E(t,o)=-\sum_j t_j \log o_j$$ with $t$ and $o$ as the target and output at neuron $j$, respectively. The sum is over…

backpropagation derivative softmax cross-entropy

asked Sep 17 '16 at 23:32

micha

703
1
6
5

Most Popular