Highest Voted Questions - Statistical Analysis Stack Exchange

45

votes

4 answers

How to interpret mean of Silhouette plot?

Im trying to use silhouette plot to determine the number of cluster in my dataset. Given the dataset Train , i used the following matlab code Train_data = full(Train); Result = []; for num_of_cluster = 1:20 centroid =…

data-visualization clustering matlab

asked May 09 '11 at 06:05

Learner

4,007
11
37
39

44

votes

5 answers

What is the purpose of characteristic functions?

I'm hoping that someone can explain, in layman's terms, what a characteristic function is and how it is used in practice. I've read that it is the Fourier transform of the pdf, so I guess I know what it is, but I still don't understand its purpose.…

probability mathematical-statistics characteristic-function

asked Apr 16 '11 at 18:00

Nick

3,327
6
28
24

44

votes

4 answers

OpenBugs vs. JAGS

I am about to try out a BUGS style environment for estimating Bayesian models. Are there any important advantages to consider in choosing between OpenBugs or JAGS? Is one likely to replace the other in the foreseeable future? I will be using the…

r software bugs jags gibbs

asked Apr 05 '11 at 15:42

DanB

898
8
13

44

votes

5 answers

AIC guidelines in model selection

I typically use BIC as my understanding is that it values parsimony more strongly than does AIC. However, I have decided to use a more comprehensive approach now and would like to use AIC as well. I know that Raftery (1995) presented nice guidelines…

r model-selection references aic bic

asked Jan 06 '14 at 20:55

Tom Carpenter

849
2
8
13

44

votes

5 answers

What is the significance of logistic regression coefficients?

I am currently reading a paper concerning voting location and voting preference in the 2000 and 2004 election. In it, there is a chart which displays logistic regression coefficients. From courses years back and a little reading up, I understand…

regression logistic interpretation

asked Mar 10 '11 at 03:45

amccormack

543
1
5
7

44

votes

2 answers

What is the difference between conditional and unconditional quantile regression?

The conditional quantile regression estimator by Koenker and Basset (1978) for the $\tau^{th}$ quantile is defined as $$ \widehat{\beta}_{QR} = \min_{b} \sum^{n}_{i=1} \rho_\tau (y_i - X'_i b_\tau) $$ where $\rho_\tau = u_i\cdot (\tau - 1(u_i<0))$…

quantile-regression

asked Dec 22 '13 at 12:39

AlexH

946
1
9
18

44

votes

2 answers

Gamma vs. lognormal distributions

I have an experimentally observed distribution that looks very similar to a gamma or lognormal distribution. I've read that the lognormal distribution is the maximum entropy probability distribution for a random variate $X$ for which the mean and…

density-function gamma-distribution lognormal-distribution

asked Oct 09 '13 at 19:51

OSE

1,057
2
10
8

44

votes

5 answers

Should you ever standardise binary variables?

I have a data set with a set of features. Some of them are binary $(1=$ active or fired, $0=$ inactive or dormant), and the rest are real valued, e.g. $4564.342$. I want to feed this data to a machine learning algorithm, so I $z$-score all the…

machine-learning normalization binary-data

asked May 18 '13 at 16:57

siamii

1,767
5
21
29

44

votes

9 answers

Tiny (real) datasets for giving examples in class?

When teaching an introductory level class, the teachers I know tend to invent some numbers and a story in order to exemplify the method they are teaching. What I would prefer is to tell a real story with real numbers. However, these stories needs…

dataset references teaching

asked Jan 03 '11 at 22:23

Tal Galili

19,935
32
133
195

44

votes

2 answers

How to interpret the output of the summary method for an lm object in R?

I am using sample algae data to understand data mining a bit more. I have used the following commands: data(algae) algae <- algae[-manyNAs(algae),] clean.algae <-knnImputation(algae, k = 10) lm.a1 <- lm(a1 ~ ., data = clean.algae[,…

r regression data-mining

asked May 17 '13 at 00:02

godzilla

593
2
6
8

44

votes

5 answers

Using LASSO from lars (or glmnet) package in R for variable selection

Sorry if this question comes across a little basic. I am looking to use LASSO variable selection for a multiple linear regression model in R. I have 15 predictors, one of which is categorical(will that cause a problem?). After setting my $x$ and $y$…

feature-selection lasso glmnet lars

asked May 08 '13 at 23:57

James

441
1
5
4

44

votes

2 answers

What is the adjusted R-squared formula in lm in R and how should it be interpreted?

What is the exact formula used in R lm() for the Adjusted R-squared? How can I interpret it? Adjusted r-squared formulas There seem to exist several formulas to calculate Adjusted R-squared. Wherry’s formula:…

r regression r-squared lm regularization

asked Jan 28 '13 at 10:39

user1272262

44

votes

15 answers

What best practices should I follow when preparing plots?

I usually make my own idiosyncratic choices when preparing plots. However, I wonder if there are any best practices for generating plots. Note: Rob's comment to an answer to this question is very relevant here.

data-visualization references

asked Jul 21 '10 at 11:00

user28

44

votes

3 answers

Dice-coefficient loss function vs cross-entropy

When training a pixel segmentation neural network, such as a fully convolutional network, how do you make the decision to use the cross-entropy loss function versus Dice-coefficient loss function? I realize this is a short question, but not quite…

neural-networks loss-functions cross-entropy

asked Jan 04 '18 at 03:12

Christian

1,382
3
16
27

44

votes

8 answers

What is the reason why we use natural logarithm (ln) rather than log to base 10 in specifying function in econometrics?

What is the reason why we use natural logarithm (ln) rather than log to base 10 in specifying functions in econometrics?

econometrics

asked Mar 27 '12 at 10:11

ritho

Most Popular