Most Popular
1500 questions
45
votes
4 answers
How to interpret mean of Silhouette plot?
Im trying to use silhouette plot to determine the number of cluster in my dataset. Given the dataset Train , i used the following matlab code
Train_data = full(Train);
Result = [];
for num_of_cluster = 1:20
centroid =…

Learner
- 4,007
- 11
- 37
- 39
44
votes
5 answers
What is the purpose of characteristic functions?
I'm hoping that someone can explain, in layman's terms, what a characteristic function is and how it is used in practice. I've read that it is the Fourier transform of the pdf, so I guess I know what it is, but I still don't understand its purpose.…

Nick
- 3,327
- 6
- 28
- 24
44
votes
4 answers
OpenBugs vs. JAGS
I am about to try out a BUGS style environment for estimating Bayesian models. Are there any important advantages to consider in choosing between OpenBugs or JAGS? Is one likely to replace the other in the foreseeable future?
I will be using the…

DanB
- 898
- 8
- 13
44
votes
5 answers
AIC guidelines in model selection
I typically use BIC as my understanding is that it values parsimony more strongly than does AIC. However, I have decided to use a more comprehensive approach now and would like to use AIC as well. I know that Raftery (1995) presented nice guidelines…

Tom Carpenter
- 849
- 2
- 8
- 13
44
votes
5 answers
What is the significance of logistic regression coefficients?
I am currently reading a paper concerning voting location and voting preference in the 2000 and 2004 election. In it, there is a chart which displays logistic regression coefficients. From courses years back and a little reading up, I understand…

amccormack
- 543
- 1
- 5
- 7
44
votes
2 answers
What is the difference between conditional and unconditional quantile regression?
The conditional quantile regression estimator by Koenker and Basset (1978) for the $\tau^{th}$ quantile is defined as
$$
\widehat{\beta}_{QR} = \min_{b} \sum^{n}_{i=1} \rho_\tau (y_i - X'_i b_\tau)
$$
where $\rho_\tau = u_i\cdot (\tau - 1(u_i<0))$…

AlexH
- 946
- 1
- 9
- 18
44
votes
2 answers
Gamma vs. lognormal distributions
I have an experimentally observed distribution that looks very similar to a gamma or lognormal distribution. I've read that the lognormal distribution is the maximum entropy probability distribution for a random variate $X$ for which the mean and…

OSE
- 1,057
- 2
- 10
- 8
44
votes
5 answers
Should you ever standardise binary variables?
I have a data set with a set of features. Some of them are binary $(1=$ active or fired, $0=$ inactive or dormant), and the rest are real valued, e.g. $4564.342$.
I want to feed this data to a machine learning algorithm, so I $z$-score all the…

siamii
- 1,767
- 5
- 21
- 29
44
votes
9 answers
Tiny (real) datasets for giving examples in class?
When teaching an introductory level class, the teachers I know tend to invent some numbers and a story in order to exemplify the method they are teaching.
What I would prefer is to tell a real story with real numbers. However, these stories needs…

Tal Galili
- 19,935
- 32
- 133
- 195
44
votes
2 answers
How to interpret the output of the summary method for an lm object in R?
I am using sample algae data to understand data mining a bit more. I have used the following commands:
data(algae)
algae <- algae[-manyNAs(algae),]
clean.algae <-knnImputation(algae, k = 10)
lm.a1 <- lm(a1 ~ ., data = clean.algae[,…

godzilla
- 593
- 2
- 6
- 8
44
votes
5 answers
Using LASSO from lars (or glmnet) package in R for variable selection
Sorry if this question comes across a little basic.
I am looking to use LASSO variable selection for a multiple linear regression model in R. I have 15 predictors, one of which is categorical(will that cause a problem?). After setting my $x$ and $y$…

James
- 441
- 1
- 5
- 4
44
votes
2 answers
What is the adjusted R-squared formula in lm in R and how should it be interpreted?
What is the exact formula used in R lm() for the Adjusted R-squared? How can I interpret it?
Adjusted r-squared formulas
There seem to exist several formulas to calculate Adjusted R-squared.
Wherry’s formula:…
user1272262
44
votes
15 answers
What best practices should I follow when preparing plots?
I usually make my own idiosyncratic choices when preparing plots. However, I wonder if there are any best practices for generating plots.
Note: Rob's comment to an answer to this question is very relevant here.
user28
44
votes
3 answers
Dice-coefficient loss function vs cross-entropy
When training a pixel segmentation neural network, such as a fully convolutional network, how do you make the decision to use the cross-entropy loss function versus Dice-coefficient loss function?
I realize this is a short question, but not quite…

Christian
- 1,382
- 3
- 16
- 27
44
votes
8 answers
What is the reason why we use natural logarithm (ln) rather than log to base 10 in specifying function in econometrics?
What is the reason why we use natural logarithm (ln) rather than log to base 10 in specifying functions in econometrics?
ritho