After reading a dataset:
dataset <- read.csv("forR.csv")
- How can I get R to give me the number of cases it contains?
- Also, will the returned value include of exclude cases omitted with
na.omit(dataset)
?
After reading a dataset:
dataset <- read.csv("forR.csv")
na.omit(dataset)
?dataset
will be a data frame. As I don't have forR.csv
, I'll make up a small data frame for illustration:
set.seed(1)
dataset <- data.frame(A = sample(c(NA, 1:100), 1000, rep = TRUE),
B = rnorm(1000))
> head(dataset)
A B
1 26 0.07730312
2 37 -0.29686864
3 57 -1.18324224
4 91 0.01129269
5 20 0.99160104
6 90 1.59396745
To get the number of cases, count the number of rows using nrow()
or NROW()
:
> nrow(dataset)
[1] 1000
> NROW(dataset)
[1] 1000
To count the data after omitting the NA
, use the same tools, but wrap dataset
in na.omit()
:
> NROW(na.omit(dataset))
[1] 993
The difference between NROW()
and NCOL()
and their lowercase variants (ncol()
and nrow()
) is that the lowercase versions will only work for objects that have dimensions (arrays, matrices, data frames). The uppercase versions will work with vectors, which are treated as if they were a 1 column matrix, and are robust if you end up subsetting your data such that R drops an empty dimension.
Alternatively, use complete.cases()
and sum
it (complete.cases()
returns a logical vector [TRUE
or FALSE
] indicating if any observations are NA
for any rows.
> sum(complete.cases(dataset))
[1] 993
Briefly:
Run dim(dataset)
to retrieve both n and k, you can also use nrow(df)
and ncol(df)
(and even NROW(df)
and NCOL(df)
-- variants are needed for other types too).
If you transform e.g. via dataset <- na.omit(dataset)
, then the cases are gone and are not counted. But if you do e.g. summary(dataset)
the NA cases are accounted for.