188

After reading a dataset:

dataset <- read.csv("forR.csv")
  • How can I get R to give me the number of cases it contains?
  • Also, will the returned value include of exclude cases omitted with na.omit(dataset)?
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
Tom Wright
  • 2,161
  • 2
  • 15
  • 14
  • 3
    I also recommend taking a look at `str()` as it provides other useful details about your object. Can often explain why a column isn't behaving as it should (factor instead of numeric, etc). – Chase Dec 08 '10 at 13:45
  • 3
    Please read the R guide of Owen first (http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf), and if possible, Introduction to R (http://cran.r-project.org/doc/manuals/R-intro.pdf). Both are on the official website of R. You're incredibly lucky you actually get an answer. On the r-help list one would redirect you to the manual in less elegant terms. No offense meant. – Joris Meys Dec 08 '10 at 15:19
  • 13
    @Joris - Point taken (without offence), but it was my impression that SE sites were designed to foster problem/solution learning in a way not afforded by manuals. Additionally, this question will now be available for other beginners. Thanks for the links though. – Tom Wright Dec 08 '10 at 15:42
  • 1
    If you're looking for pure code solutions, stackoverflow might be more appropriate. Although, all the R gurus present @ SO are also here (not counting myself). :) – Roman Luštrik Dec 08 '10 at 17:39
  • 2
    I disagree with your assertion that this question will be helpful for other beginners, *especially* if they don't skim the manual. They will just create a duplicate question. – Joshua Ulrich Dec 08 '10 at 21:01
  • 1
    @JorisMeys: thanks for the link to the R guide.. hadn't come across that yet in my learning of R and it's exactly what I'd been looking for. – User Mar 01 '12 at 23:12
  • 9
    And, four years later, this is the second hit I got on Google trying to find an answer to this question. No need for me to create a duplicate (@JoshuaUlrich). – Richard Dec 26 '14 at 06:58
  • 2
    @Richard Just noticed that (6 years on) this question has 100 upvotes and is consequently well within the top 0.1% of questions on the site. I find this very interesting. – Tom Wright Nov 08 '16 at 08:58

2 Answers2

217

dataset will be a data frame. As I don't have forR.csv, I'll make up a small data frame for illustration:

set.seed(1)
dataset <- data.frame(A = sample(c(NA, 1:100), 1000, rep = TRUE),
                      B = rnorm(1000))

> head(dataset)
   A           B
1 26  0.07730312
2 37 -0.29686864
3 57 -1.18324224
4 91  0.01129269
5 20  0.99160104
6 90  1.59396745

To get the number of cases, count the number of rows using nrow() or NROW():

> nrow(dataset)
[1] 1000
> NROW(dataset)
[1] 1000

To count the data after omitting the NA, use the same tools, but wrap dataset in na.omit():

> NROW(na.omit(dataset))
[1] 993

The difference between NROW() and NCOL() and their lowercase variants (ncol() and nrow()) is that the lowercase versions will only work for objects that have dimensions (arrays, matrices, data frames). The uppercase versions will work with vectors, which are treated as if they were a 1 column matrix, and are robust if you end up subsetting your data such that R drops an empty dimension.

Alternatively, use complete.cases() and sum it (complete.cases() returns a logical vector [TRUE or FALSE] indicating if any observations are NA for any rows.

> sum(complete.cases(dataset))
[1] 993
Gavin Simpson
  • 37,567
  • 5
  • 110
  • 153
40

Briefly:

  1. Run dim(dataset) to retrieve both n and k, you can also use nrow(df) and ncol(df) (and even NROW(df) and NCOL(df) -- variants are needed for other types too).

  2. If you transform e.g. via dataset <- na.omit(dataset), then the cases are gone and are not counted. But if you do e.g. summary(dataset) the NA cases are accounted for.

Dirk Eddelbuettel
  • 8,362
  • 2
  • 28
  • 43