After reading a dataset:
dataset <- read.csv("forR.csv")
- How can I get R to give me the number of cases it contains?
- Also, will the returned value include of exclude cases omitted with
na.omit(dataset)?
After reading a dataset:
dataset <- read.csv("forR.csv")
na.omit(dataset)?dataset will be a data frame. As I don't have forR.csv, I'll make up a small data frame for illustration:
set.seed(1)
dataset <- data.frame(A = sample(c(NA, 1:100), 1000, rep = TRUE),
B = rnorm(1000))
> head(dataset)
A B
1 26 0.07730312
2 37 -0.29686864
3 57 -1.18324224
4 91 0.01129269
5 20 0.99160104
6 90 1.59396745
To get the number of cases, count the number of rows using nrow() or NROW():
> nrow(dataset)
[1] 1000
> NROW(dataset)
[1] 1000
To count the data after omitting the NA, use the same tools, but wrap dataset in na.omit():
> NROW(na.omit(dataset))
[1] 993
The difference between NROW() and NCOL() and their lowercase variants (ncol() and nrow()) is that the lowercase versions will only work for objects that have dimensions (arrays, matrices, data frames). The uppercase versions will work with vectors, which are treated as if they were a 1 column matrix, and are robust if you end up subsetting your data such that R drops an empty dimension.
Alternatively, use complete.cases() and sum it (complete.cases() returns a logical vector [TRUE or FALSE] indicating if any observations are NA for any rows.
> sum(complete.cases(dataset))
[1] 993
Briefly:
Run dim(dataset) to retrieve both n and k, you can also use nrow(df) and ncol(df) (and even NROW(df) and NCOL(df) -- variants are needed for other types too).
If you transform e.g. via dataset <- na.omit(dataset), then the cases are gone and are not counted. But if you do e.g. summary(dataset) the NA cases are accounted for.