71

How can I remove duplicate rows from this example data frame?

A   1
A   1
A   2
B   4  
B   1
B   1
C   2
C   2

I would like to remove the duplicates based on both the columns:

A   1
A   2
B   4
B   1
C   2

Order is not important.

saladi
  • 107
  • 6
Jana
  • 969
  • 1
  • 8
  • 13
  • @whuber shouldn't that be moved to SO? – llrs Mar 31 '17 at 09:55
  • @Llopis Yes, but it's too late to do that now--and it was too late when we originally closed it. This kind of question was considered (borderline) on-topic many years ago but nowadays it would be migrated quickly. – whuber Mar 31 '17 at 14:12

2 Answers2

115

unique() indeed answers your question, but another related and interesting function to achieve the same end is duplicated().

It gives you the possibility to look up which rows are duplicated.

a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c(1,1,2,4,1,1,2,2)
df <-data.frame(a,b)

duplicated(df)
[1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE

> df[duplicated(df), ]
  a b
2 A 1
6 B 1
8 C 2

> df[!duplicated(df), ]
  a b
1 A 1
3 A 2
4 B 4
5 B 1
7 C 2
Rahul
  • 1,274
  • 1
  • 8
  • 2
  • 2
    Thanks for mentioning the 'duplicated' function. It can be used to delete duplicated rows based on a subset of the columns. – Joko Jan 20 '16 at 15:27
51

You are looking for unique().

a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c(1,1,2,4,1,1,2,2)
df <-data.frame(a,b)
unique(df)

> unique(df)
  a b
1 A 1
3 A 2
4 B 4
5 B 1
7 C 2
Bernd Weiss
  • 7,044
  • 28
  • 40
  • 1
    Thanks Bernd. I thought unique can be applied only for a specific column. I didn't know that it can be used for the entire data frame as well. thanks again – Jana Jan 31 '11 at 20:25