Highest Voted Questions - Statistical Analysis Stack Exchange

54

votes

3 answers

Interpretation of log transformed predictor and/or response

I'm wondering if it makes a difference in interpretation whether only the dependent, both the dependent and independent, or only the independent variables are log transformed. Consider the case of log(DV) = Intercept + B1*IV + Error I can…

regression data-transformation interpretation regression-coefficients logarithm

asked Nov 16 '11 at 10:03

upabove

2,657
10
30
37

54

votes

1 answer

How large should the batch size be for stochastic gradient descent?

I understand that stochastic gradient descent may be used to optimize a neural network using backpropagation by updating each iteration with a different sample of the training dataset. How large should the batch size be?

machine-learning neural-networks gradient-descent backpropagation

asked Mar 07 '15 at 21:18

Simon Kuang

2,051
3
17
18

54

votes

8 answers

How to tell the probability of failure if there were no failures?

I was wondering if there is a way to tell the probability of something failing (a product) if we have 100,000 products in the field for 1 year and with no failures? What is the probability that one of the next 10,000 products sold fail?

probability survival binomial-distribution

asked Jan 21 '15 at 18:39

melonfresh

541
1
5
3

54

votes

4 answers

Replicating Stata's "robust" option in R

I have been trying to replicate the results of the Stata option robust in R. I have used the rlm command form the MASS package and also the command lmrob from the package "robustbase". In both cases the results are quite different from the "robust"…

r stata robust robust-standard-error

asked Sep 28 '14 at 12:42

user56579

541
1
5
4

54

votes

3 answers

Do we have a problem of "pity upvotes"?

I know, this may sound like it is off-topic, but hear me out. At Stack Overflow and here we get votes on posts, this is all stored in a tabular form. E.g.: post id voter id vote type datetime ------- -------- --------- …

time-series hypothesis-testing data-mining markov-process censoring

asked Jun 01 '11 at 01:57

Sam Saffron

619
4
7

53

votes

10 answers

Machine Learning using Python

I am considering using Python libraries for doing my Machine Learning experiments. Thus far, I had been relying on WEKA, but have been pretty dissatisfied on the whole. This is primarily because I have found WEKA to be not so well supported (very…

machine-learning python

asked Mar 27 '11 at 04:00

Andy

1,583
3
21
19

53

votes

5 answers

Correct spelling (capitalization, italicization, hyphenation) of "p-value"?

I realize this is pedantic and trite, but as a researcher in a field outside of statistics, with limited formal education in statistics, I always wonder if I'm writing "p-value" correctly. Specifically: Is the "p" supposed to be capitalized? Is the…

hypothesis-testing p-value terminology

asked Jul 28 '10 at 04:08

gotgenes

913
2
8
9

53

votes

2 answers

Why is a Bayesian not allowed to look at the residuals?

In the article "Discussion: Should Ecologists Become Bayesians?" Brian Dennis gives a surprisingly balanced and positive view of Bayesian statistics when his aim seems to be to warn people about it. However, in one paragraph, without any citations…

bayesian residuals frequentist likelihood-principle

asked Feb 06 '14 at 08:53

Mankka

633
5
8

53

votes

2 answers

Linear kernel and non-linear kernel for support vector machine?

When using support vector machine, are there any guidelines on choosing linear kernel vs. nonlinear kernel, like RBF? I once heard that non-linear kernel tends not to perform well once the number of features is large. Are there any references on…

machine-learning classification svm references kernel-trick

asked Oct 17 '13 at 02:21

user3269

4,622
8
43
53

53

votes

2 answers

Why are MA(q) time series models called "moving averages"?

When I read "moving average" in relation to a time series, I think something like $\frac{(x_{t-1} + x_{t-2} + x_{t-3})}3$, or perhaps a weighted average like $0.5x_{t-1} + 0.3x_{t-2} + 0.2x_{t-3}$. (I realize these are actually AR(3) models, but…

time-series arima terminology moving-average

asked May 06 '13 at 01:49

Stats newb

583
1
5
7

53

votes

4 answers

Is a sample covariance matrix always symmetric and positive definite?

When computing the covariance matrix of a sample, is one then guaranteed to get a symmetric and positive-definite matrix? Currently my problem has a sample of 4600 observation vectors and 24 dimensions.

sampling covariance

asked Mar 22 '13 at 07:14

Morten

918
1
9
11

53

votes

4 answers

Multinomial logistic regression vs one-vs-rest binary logistic regression

Lets say we have a dependent variable $Y$ with few categories and set of independent variables. What are the advantages of multinomial logistic regression over set of binary logistic regressions (i.e. one-vs-rest scheme)? By set of binary logistic…

logistic categorical-data multinomial-distribution

asked Mar 13 '13 at 14:31

Tomek Tarczynski

3,854
7
29
37

53

votes

2 answers

Pandas / Statsmodel / Scikit-learn

Are Pandas, Statsmodels and Scikit-learn different implementations of machine learning/statistical operations, or are these complementary to one another? Which of these has the most comprehensive functionality? Which one is actively developed…

machine-learning python scikit-learn statsmodels pandas

asked Jan 17 '13 at 01:02

Nik

1,279
2
13
19

53

votes

2 answers

How to simulate artificial data for logistic regression?

I know I'm missing something in my understanding of logistic regression, and would really appreciate any help. As far as I understand it, the logistic regression assumes that the probability of a '1' outcome given the inputs, is a linear combination…

r regression logistic generalized-linear-model simulation

asked Dec 25 '12 at 14:59

zorbar

727
1
7
9

53

votes

4 answers

Fast linear regression robust to outliers

I am dealing with linear data with outliers, some of which are at more the 5 standard deviations away from the estimated regression line. I'm looking for a linear regression technique that reduces the influence of these points. So far what I did is…

regression linear-model outliers robust fused-lasso

asked Dec 19 '12 at 10:47

Matteo Fasiolo

3,134
2
20
29

Most Popular