Highest Voted Questions - Statistical Analysis Stack Exchange

35

votes

2 answers

Why is lambda "within one standard error from the minimum" is a recommended value for lambda in an elastic net regression?

I understand what role lambda plays in an elastic-net regression. And I can understand why one would select lambda.min, the value of lambda that minimizes cross validated error. My question is Where in the statistics literature is it recommended to…

regression cross-validation regularization glmnet elastic-net

asked Feb 20 '15 at 20:56

jhersh

353
1
4
5

35

votes

2 answers

How to use both binary and continuous variables together in clustering?

I need to use binary variables (values 0 & 1) in k-means. But k-means only works with continuous variables. I know some people still use these binary variables in k-means ignoring the fact that k-means is only designed for continuous variables. This…

r clustering binary-data k-means mixed-type-data

asked Jan 02 '15 at 14:55

GeorgeOfTheRF

5,063
14
42
51

35

votes

2 answers

Is there a boxplot variant for Poisson distributed data?

I'd like to know if there is a boxplot variant adapted to Poisson distributed data (or possibly other distributions)? With a Gaussian distribution, whiskers placed at L = Q1 - 1.5 IQR and U = Q3 + 1.5 IQR, the boxplot has the property that there…

data-visualization poisson-distribution boxplot

asked Jul 15 '11 at 11:19

caas

535
1
4
7

35

votes

2 answers

Should we address multiple comparisons adjustments when using confidence intervals?

Suppose we have a multiple comparisons scenario such as post hoc inference on pairwise statistics, or like a multiple regression, where we are making a total of $m$ comparisons. Suppose also, that we would like to support inference in these…

confidence-interval multiple-comparisons inference

asked Sep 09 '14 at 19:09

Alexis

26,219
5
78
131

35

votes

13 answers

What statistical blogs would you recommend?

What statistical research blogs would you recommend, and why?

references

asked Jul 19 '10 at 21:00

csgillespie

11,849
9
56
85

35

votes

8 answers

In Naive Bayes, why bother with Laplace smoothing when we have unknown words in the test set?

I was reading over Naive Bayes Classification today. I read, under the heading of Parameter Estimation with add 1 smoothing: Let $c$ refer to a class (such as Positive or Negative), and let $w$ refer to a token or word. The maximum likelihood…

machine-learning classification text-mining naive-bayes laplace-smoothing

asked Jul 22 '14 at 04:29

tumultous_rooster

1,145
4
14
24

35

votes

3 answers

Is it possible to change a hypothesis to match observed data (aka fishing expedition) and avoid an increase in Type I errors?

It is well known that researchers should spend time observing and exploring existing data and research before forming a hypothesis and then collecting data to test that hypothesis (referring to null-hypothesis significance testing). Many basic…

hypothesis-testing

asked May 27 '14 at 09:45

post-hoc

677
1
6
14

34

votes

4 answers

Checking if two Poisson samples have the same mean

This is an elementary question, but I wasn't able to find the answer. I have two measurements: n1 events in time t1 and n2 events in time t2, both produced (say) by Poisson processes with possibly-different lambda values. This is actually from a…

hypothesis-testing poisson-distribution

asked Apr 14 '11 at 14:26

Charles

1,068
1
7
14

34

votes

3 answers

How can I interpret a confusion matrix

I am using confusion matrix to check the performance of my classifier. I am using Scikit-Learn, I am little bit confused. How can I interpret the result from from sklearn.metrics import confusion_matrix >>> y_true = [2, 0, 2, 2, 0, 1] >>> y_pred…

predictive-models prediction confusion-matrix

asked Apr 25 '14 at 17:00

user3378649

1,107
4
13
22

34

votes

6 answers

Difference between Bayes network, neural network, decision tree and Petri nets

What is the difference between neural network, Bayesian network, decision tree and Petri nets, even though they are all graphical models and visually depict cause-effect relationship.

machine-learning neural-networks bayesian-network fuzzy

asked Apr 21 '14 at 04:16

Ria George

1,375
2
14
31

34

votes

2 answers

How to derive the standard error of linear regression coefficient

For this univariate linear regression model $$y_i = \beta_0 + \beta_1x_i+\epsilon_i$$ given data set $D=\{(x_1,y_1),...,(x_n,y_n)\}$, the coefficient estimates are $$\hat\beta_1=\frac{\sum_ix_iy_i-n\bar x\bar y}{n\bar x^2-\sum_ix_i^2}$$…

standard-error inference

asked Feb 09 '14 at 09:11

avocado

3,045
5
32
45

34

votes

3 answers

Does a sample version of the one-sided Chebyshev inequality exist?

I am interested in the following one-sided Cantelli's version of the Chebyshev inequality: $$ \mathbb P(X - \mathbb E (X) \geq t) \leq \frac{\mathrm{Var}(X)}{\mathrm{Var}(X) + t^2} \,. $$ Basically, if you know the population mean and variance, you…

probability mathematical-statistics probability-inequalities mean

asked Jan 16 '14 at 01:38

casandra

583
4
8

34

votes

2 answers

Am I creating bias by using the same random seed over and over?

In almost all of the analysis work that I've ever done I use: set.seed(42) It's an homage to Hitchhiker's Guide to the Galaxy. But I'm wondering if I'm creating bias by using the same seed over and over.

random-generation

asked Dec 23 '13 at 13:52

Brandon Bertelsen

6,672
9
35
46

34

votes

5 answers

Data "exploration" vs data "snooping"/"torturing"?

Many times I have come across informal warnings against "data snooping" (here's one amusing example), and I think I have an intuitive idea of roughly what that means, and why it may be a problem. On the other hand, "exploratory data analysis" seems…

multiple-comparisons interpretation exploratory-data-analysis

asked Sep 16 '13 at 15:36

kjo

1,817
1
16
24

34

votes

5 answers

How to sample from a discrete distribution?

Assume I have a distribution governing the possible outcome from a single random variable X. This is something like [0.1, 0.4, 0.2, 0.3] for X being a value of either 1, 2, 3, 4. Is it possible to sample from this distribution, i.e. generate pseudo…

probability distributions random-generation weighted-sampling

asked Aug 20 '13 at 20:40

Barry

483
1
4
6

Most Popular