Most Popular

1500 questions
38
votes
2 answers

Why use stratified cross validation? Why does this not damage variance related benefit?

I've been told that is beneficial to use stratified cross validation especially when response classes are unbalanced. If one purpose of cross-validation is to help account for the randomness of our original training data sample, surely making each…
James Owers
  • 627
  • 1
  • 5
  • 11
38
votes
7 answers

Is there a good browser/viewer to see an R dataset (.rda file)

I want to browse a .rda file (R dataset). I know about the View(datasetname) command. The default R.app that comes for Mac does not have a very good browser for data (it opens a window in X11). I like the RStudio data browser that opens with the…
Curious2learn
  • 695
  • 2
  • 6
  • 8
38
votes
3 answers

How does R handle missing values in lm?

I'd like to regress a vector B against each of the columns in a matrix A. This is trivial if there are no missing data, but if matrix A contains missing values, then my regression against A is constrained to include only rows where all values are…
David Quigley
  • 483
  • 1
  • 4
  • 7
38
votes
2 answers

How do I know which method of cross validation is best?

I am trying to figure out which cross validation method is best for my situation. The following data are just an example for working through the issue (in R), but my real X data (xmat) are correlated with each other and correlated to different…
rdorlearn
  • 3,493
  • 6
  • 26
  • 29
38
votes
1 answer

Why does glmer not achieve the maximum likelihood (as verified by applying further generic optimization)?

Numerically deriving the MLEs of GLMM is difficult and, in practice, I know, we should not use brute force optimization (e.g., using optim in a simple way). But for my own educational purpose, I want to try it to make sure I correctly understand the…
quibble
  • 1,167
  • 10
  • 17
37
votes
4 answers

Functions of Independent Random Variables

Is the claim that functions of independent random variables are themselves independent, true? I have seen that result often used implicitly in some proofs, for example in the proof of independence between the sample mean and the sample variance of…
JohnK
  • 18,298
  • 10
  • 60
  • 103
37
votes
8 answers

Help me calculate how many people will come to my wedding! Can I attribute a percentage to each person and add them?

I am planning my wedding. I wish to estimate how many people will come to my wedding. I have created a list of people and the chance that they will attend in percentage. For example Dad 100% Mom 100% Bob 50% Marc 10% Jacob 25% Joseph 30% I…
Behacad
  • 4,916
  • 8
  • 30
  • 48
37
votes
5 answers

How to resolve Simpson's paradox?

Simpson's paradox is a classic puzzle discussed in introductory statistics courses worldwide. However, my course was content to simply note that a problem existed and did not provide a solution. I would like to know how to resolve the paradox. That…
Potato
  • 1,025
  • 1
  • 11
  • 12
37
votes
4 answers

What is the difference between McNemar's test and the chi-squared test, and how do you know when to use each?

I have tried reading up on different sources, but I am still not clear what test would be the appropriate in my case. There are three different questions I am asking about my dataset: The subjects are tested for infections from X at different…
Anto
  • 693
  • 1
  • 8
  • 13
37
votes
2 answers

Error "system is computationally singular" when running a glm

I'm using the robustbase package to run a glm estimation. However when I do it, I get the following error: Error in solve.default(crossprod(X, DiagB * X)/nobs, EEq) : system is computationally singular: reciprocal condition number =…
NK1
  • 543
  • 1
  • 5
  • 6
37
votes
4 answers

Area under curve of ROC vs. overall accuracy

I am a little bit confused about the Area Under Curve (AUC) of ROC and the overall accuracy. Will the AUC be proportional to the overall accuracy? In other words, when we have a larger overall accuracy will we definitely a get larger AUC? Or are…
Samo Jerom
  • 1,439
  • 2
  • 19
  • 31
37
votes
8 answers

What is Bayes' theorem all about?

What are the main ideas, that is, concepts related to Bayes' theorem? I am not asking for any derivations of complex mathematical notation.
user333
  • 6,621
  • 17
  • 44
  • 54
37
votes
2 answers

Model selection and cross-validation: The right way

There are numerous threads in CrossValidated on the topic of model selection and cross validation. Here are a few: Internal vs external cross-validation and model selection @DikranMarsupial's top answer to Feature selection and…
Amelio Vazquez-Reina
  • 17,546
  • 26
  • 74
  • 110
37
votes
3 answers

Difference between a SVM and a perceptron

I am a bit confused with the difference between an SVM and a perceptron. Let me try to summarize my understanding here, and please feel free to correct where I am wrong and fill in what I have missed. The Perceptron does not try to optimize the…
CuriousMind
  • 2,133
  • 5
  • 24
  • 32
37
votes
3 answers

How do I interpret the 'correlations of fixed effects' in my glmer output?

I have the following output: Generalized linear mixed model fit by the Laplace approximation Formula: aph.remain ~ sMFS2 +sAG2 +sSHDI2 +sbare +season +crop +(1|landscape) AIC BIC logLik deviance 4062 4093 -2022 4044 Random…
susie
  • 641
  • 2
  • 8
  • 9