Removing duplicated rows data frame in R

Question

How can I remove duplicate rows from this example data frame?

I would like to remove the duplicates based on both the columns:

Order is not important.

@Llopis Yes, but it's too late to do that now--and it was too late when we originally closed it. This kind of question was considered (borderline) on-topic many years ago but nowadays it would be migrated quickly. — whuber, Mar 31 '17 at 14:12

score 115 · Accepted Answer · answered Feb 02 '11 at 09:27

115

unique() indeed answers your question, but another related and interesting function to achieve the same end is duplicated().

It gives you the possibility to look up which rows are duplicated.

a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c(1,1,2,4,1,1,2,2)
df <-data.frame(a,b)

duplicated(df)
[1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE

> df[duplicated(df), ]
  a b
2 A 1
6 B 1
8 C 2

> df[!duplicated(df), ]
  a b
1 A 1
3 A 2
4 B 4
5 B 1
7 C 2

answered Feb 02 '11 at 09:27

Rahul

1,274
1
8
2

2

Thanks for mentioning the 'duplicated' function. It can be used to delete duplicated rows based on a subset of the columns. – Joko Jan 20 '16 at 15:27

score 51 · Answer 2 · answered Jan 31 '11 at 20:22

51

You are looking for unique().

a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c(1,1,2,4,1,1,2,2)
df <-data.frame(a,b)
unique(df)

> unique(df)
  a b
1 A 1
3 A 2
4 B 4
5 B 1
7 C 2

answered Jan 31 '11 at 20:22

Bernd Weiss

7,044
28
40

1

Thanks Bernd. I thought unique can be applied only for a specific column. I didn't know that it can be used for the entire data frame as well. thanks again – Jana Jan 31 '11 at 20:25

Removing duplicated rows data frame in R

2 Answers2