Questions tagged [r]

Use this tag for any *on-topic* question that (a) involves `R` either as a critical part of the question or expected answer, & (b) is not *just* about how to use `R`.

Usage on CV

R-based questions are frequently migrated between Cross Validated (CV) and Stack Overflow (SO). CV fields questions with statistical content or of statistical interest and SO fields questions of programming and implementation.

Your question belongs on CV when any of the following apply:

  • You're not sure what the right procedure is to use on your data.

  • You would like help interpreting and understanding the output of an R procedure.

  • You would like help with producing a certain type of data visualization (or selecting the most appropriate one).

Your question belongs on SO the following applies:

R

R is an open source programming language and software environment for statistical computing and graphics. R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. R was created by Ross Ihaka and Robert Gentleman and is now developed by the R Development Core Team. The R environment is easily extended through a packaging system on CRAN.

As of June 2020, formatting R code has been simplified. For small snippets within text, use backticks, as in `x <- c(1,2)`. For blocks of code, just paste (or type) them in (no initial spaces needed) but precede the block with ``` lang-R and follow it with ```, as in

``` lang-R

for (i in 1:3) {
  hist(rnorm(10*i))
}

```

Official CRAN Documentation

Additional free resources include:

  • PDF HTML An Introduction to R, a basic introduction for beginners.
  • PDF HTML The R Language Definition, a more technical discussion of the R language itself.
  • PDF HTML Writing R Extensions, a development guide for R.
  • PDF HTML R Data Import/Export, a data import and export guide.
  • PDF HTML R Installation , an installation guide (from R source code).
  • PDF HTML R Internals, internal structures and coding guidelines.

Free Resources

Free resource materials include:

  • Wikibook The R Programming wikibook, a collaborative textbook
  • PDF The R Inferno by Patrick Burns
  • Try R - A web-based R tutorial
  • R by example
  • CRAN maintains an extensive list of free contributed documentation in a range of languages.
  • The R Journal lists research articles and summaries of major revisions.

We also maintain a list of internet based resources for R on meta.CV here.

Other Resources

Recommended additional R resources include:

Frequently Asked Questions

Lists of frequently asked questions include:

26743 questions
376
votes
26 answers

Python as a statistics workbench

Lots of people use a main tool like Excel or another spreadsheet, SPSS, Stata, or R for their statistics needs. They might turn to some specific package for very special needs, but a lot of things can be done with a simple spreadsheet or a general…
Fabian Fagerholm
  • 215
  • 3
  • 6
  • 7
354
votes
12 answers

Difference between logit and probit models

What is the difference between Logit and Probit model? I'm more interested here in knowing when to use logistic regression, and when to use Probit. If there is any literature which defines it using R, that would be helpful as well.
Beta
  • 5,784
  • 9
  • 33
  • 44
271
votes
2 answers

Interpretation of R's lm() output

The help pages in R assume I know what those numbers mean, but I don't. I'm trying to really intuitively understand every number here. I will just post the output and comment on what I found out. There might (will) be mistakes, as I'll just write…
Alexander Engelhardt
  • 4,161
  • 3
  • 21
  • 25
215
votes
4 answers

How to interpret a QQ plot

I am working with a small dataset (21 observations) and have the following normal QQ plot in R: Seeing that the plot does not support normality, what could I infer about the underlying distribution? It seems to me that a distribution more skewed…
JohnK
  • 18,298
  • 10
  • 60
  • 103
197
votes
3 answers

R's lmer cheat sheet

There's a lot of discussion going on on this forum about the proper way to specify various hierarchical models using lmer. I thought it would be great to have all the information in one place. A couple of questions to start: How to specify multiple…
193
votes
10 answers

How to deal with perfect separation in logistic regression?

If you have a variable which perfectly separates zeroes and ones in target variable, R will yield the following "perfect or quasi perfect separation" warning message: Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred We…
user333
  • 6,621
  • 17
  • 44
  • 54
188
votes
9 answers

How to summarize data by group in R?

I have R data frame like this: age group 1 23.0883 1 2 25.8344 1 3 29.4648 1 4 32.7858 2 5 33.6372 1 6 34.9350 1 7 35.2115 2 8 35.2115 2 9 35.2115 2 10 36.7803 1 ... I need to get…
Yuriy Petrovskiy
  • 4,081
  • 7
  • 25
  • 30
188
votes
2 answers

How do I get the number of rows of a data.frame in R?

After reading a dataset: dataset <- read.csv("forR.csv") How can I get R to give me the number of cases it contains? Also, will the returned value include of exclude cases omitted with na.omit(dataset)?
Tom Wright
  • 2,161
  • 2
  • 15
  • 14
183
votes
2 answers

How to determine which distribution fits my data best?

I have a dataset and would like to figure out which distribution fits my data best. I used the fitdistr() function to estimate the necessary parameters to describe the assumed distribution (i.e. Weibull, Cauchy, Normal). Using those parameters I…
167
votes
21 answers

Does Julia have any hope of sticking in the statistical community?

I recently read a post from R-Bloggers, that linked to this blog post from John Myles White about a new language called Julia. Julia takes advantage of a just-in-time compiler that gives it wicked fast run times and puts it on the same order of…
Christopher Aden
  • 1,775
  • 4
  • 24
  • 43
157
votes
3 answers

How are the standard errors of coefficients calculated in a regression?

For my own understanding, I am interested in manually replicating the calculation of the standard errors of estimated coefficients as, for example, come with the output of the lm() function in R, but haven't been able to pin it down. What is the…
ako
  • 1,673
  • 3
  • 11
  • 7
151
votes
25 answers

R vs SAS, why is SAS preferred by private companies?

I learned R but it seems that companies are much more interested in SAS experience. What are the advantages of SAS over R?
Benoit_Plante
  • 2,461
  • 4
  • 18
  • 25
146
votes
6 answers

Correlations with unordered categorical variables

I have a dataframe with many observations and many variables. Some of them are categorical (unordered) and the others are numerical. I'm looking for associations between these variables. I've been able to compute correlation for numerical variables…
Clément F
  • 1,717
  • 4
  • 12
  • 13
128
votes
2 answers

Removal of statistically significant intercept term increases $R^2$ in linear model

In a simple linear model with a single explanatory variable, $\alpha_i = \beta_0 + \beta_1 \delta_i + \epsilon_i$ I find that removing the intercept term improves the fit greatly (value of $R^2$ goes from 0.3 to 0.9). However, the intercept term…
Ernest A
  • 2,062
  • 3
  • 17
  • 16
107
votes
4 answers

What is rank deficiency, and how to deal with it?

Fitting a logistic regression using lme4 ends with Error in mer_finalize(ans) : Downdated X'X is not positive definite. A likely cause of this error is apparently rank deficiency. What is rank deficiency, and how should I address it?
Jack Tanner
  • 4,552
  • 3
  • 27
  • 39
1
2 3
99 100