Most Popular
1500 questions
37
votes
2 answers
Interpretation of plot (glm.model)
Can anyone tell me how to interpret the 'residuals vs fitted', 'normal q-q', 'scale-location', and 'residuals vs leverage' plots? I am fitting a binomial GLM, saving it and then plotting it.

Summer
- 371
- 1
- 4
- 4
37
votes
5 answers
How to visualize/understand what a neural network is doing?
Neural networks are often treated as "black boxes" due to their complex structure. This is not ideal, as it is often beneficial to have an intuitive grasp of how a model is working internally. What are methods of visualizing how a trained neural…

rm999
- 748
- 5
- 10
37
votes
2 answers
Probability inequalities
I am looking for some probability inequalities for sums of unbounded random variables. I would really appreciate it if anyone can provide me some thoughts.
My problem is to find an exponential upper bound over the probability that the sum of…

Farzad
- 575
- 3
- 7
37
votes
5 answers
Examples of PCA where PCs with low variance are "useful"
Normally in principal component analysis (PCA) the first few PCs are used and the low variance PCs are dropped, as they do not explain much of the variation in the data.
However, are there examples where the low variation PCs are useful (i.e. have…

Michael
- 373
- 3
- 4
37
votes
6 answers
Assumptions of linear models and what to do if the residuals are not normally distributed
I am a little bit confused on what the assumptions of linear regression are.
So far I checked whether:
all of the explanatory variables correlated linearly with the response variable. (This was the case)
there was any collinearity among the…

Stefan
- 705
- 2
- 8
- 9
36
votes
5 answers
Free data set for very high dimensional classification
What are the freely available data set for classification with more than 1000 features (or sample points if it contains curves)?
There is already a community wiki about free data sets:
Locating freely available data samples
But here, it would be…

robin girard
- 6,335
- 6
- 46
- 60
36
votes
2 answers
When is logistic regression solved in closed form?
Take $x \in \{0,1\}^d$ and $y \in \{0,1\}$ and suppose we model the task of predicting y given x using logistic regression. When can logistic regression coefficients be written in closed form?
One example is when we use a saturated model.
That is,…

Yaroslav Bulatov
- 5,167
- 2
- 24
- 38
36
votes
2 answers
Relative importance of a set of predictors in a random forests classification in R
I'd like to determine the relative importance of sets of variables toward a randomForest classification model in R. The importance function provides the MeanDecreaseGini metric for each individual predictor--is it as simple as summing this across…

Max Ghenis
- 780
- 1
- 9
- 17
36
votes
1 answer
What does the anova() command do with a lmer model object?
Hopefully this is a question that someone here can answer for me on the nature of decomposing sums of squares from a mixed-effects model fit with lmer (from the lme4 R package).
First off I should say that I am aware of the controversy with using…

Martyn
- 506
- 1
- 4
- 7
36
votes
1 answer
Error metrics for cross-validating Poisson models
I'm cross validating a model that's trying to predict a count. If this was a binary classification problem, I'd calculate out-of-fold AUC, and if this was a regression problem I'd calculate out-of-fold RMSE or MAE.
For a Poisson model, what error…

Zach
- 22,308
- 18
- 114
- 158
36
votes
3 answers
Which variance inflation factor should I be using: $\text{GVIF}$ or $\text{GVIF}^{1/(2\cdot\text{df})}$?
I'm trying to interpret variance inflation factors using the vif function in the R package car. The function prints both a generalised $\text{VIF}$ and also $\text{GVIF}^{1/(2\cdot\text{df})}$. According to the help file, this latter value
To…

jay
- 1,045
- 1
- 12
- 23
36
votes
5 answers
Neural network with skip-layer connections
I am interested in regression with neural networks.
Neural networks with zero hidden nodes + skip-layer connections are linear models.
What about the same neural nets but with hidden nodes ?
I am wondering what would be the role of the skip-layer…

Ben
- 521
- 1
- 5
- 5
36
votes
3 answers
Is it possible to find the combined standard deviation?
Suppose I have 2 sets:
Set A: number of items $n= 10$, $\mu = 2.4$ , $\sigma = 0.8$
Set B: number of items $n= 5$, $\mu = 2$, $\sigma = 1.2$
I can find the combined mean ($\mu$) easily, but how am I supposed to find the combined standard deviation?

kype
- 495
- 1
- 4
- 5
36
votes
6 answers
Backpropagation vs Genetic Algorithm for Neural Network training
I've read a few papers discussing pros and cons of each method, some arguing that GA doesn't give any improvement in finding the optimal solution while others show that it is more effective. It seems GA is generally preferred in literature (although…

sashkello
- 2,198
- 1
- 20
- 26
36
votes
1 answer
Multiple comparisons on a mixed effects model
I am trying to analyse some data using a mixed effect model. The data I collected represent the weight of some young animals of different genotype over time.
I am using the approach proposed…

nico
- 4,246
- 3
- 28
- 42