Categories
duplicates sql

Finding duplicate values in a SQL table

2284

It’s easy to find duplicates with one field:

SELECT email, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

So if we have a table

ID   NAME   EMAIL
1    John   [email protected]
2    Sam    [email protected]
3    Tom    [email protected]
4    Bob    [email protected]
5    Tom    [email protected]

This query will give us John, Sam, Tom, Tom because they all have the same email.

However, what I want is to get duplicates with the same email and name.

That is, I want to get “Tom”, “Tom”.

The reason I need this: I made a mistake, and allowed inserting duplicate name and email values. Now I need to remove/change the duplicates, so I need to find them first.

2

  • 37

    I don’t think it would let you select name in your first sample since it’s not in an aggregate function. “What is the count of matching email addresses and their name” is some tricky logic…

    – sXe

    Jan 4, 2013 at 18:09

  • 4

    Found that this doesn’t work with MSSQL server because of the name field in the SELECT.

    Nov 8, 2018 at 9:06

3491

SELECT
    name, email, COUNT(*)
FROM
    users
GROUP BY
    name, email
HAVING 
    COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of “functional dependency”:

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

13

  • 104

    @webXL WHERE works with single record HAVING works with group

    – bjan

    Apr 11, 2013 at 4:27

  • 9

    @gbn Is it possible to include the Id in the results? Then it would be easier to delete those duplicates afterwards.

    Jun 10, 2014 at 9:36

  • 15

    @user797717: you’d need to have MIN(ID) and then delete for ID values not in the last if MIN(ID) values

    – gbn

    Jun 10, 2014 at 9:59

  • 2

    What about cases where any of the columns have null values?

    Sep 16, 2016 at 6:38

  • 3

    Thanks so much for this, and yes it does work in Oracle, though I needed uniqueness of the condition, so rather than >1 =1

    Oct 7, 2019 at 16:17

423

try this:

declare @YourTable table (id int, name varchar(10), email varchar(50))

INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')

SELECT
    name,email, COUNT(*) AS CountOf
    FROM @YourTable
    GROUP BY name,email
    HAVING COUNT(*)>1

OUTPUT:

name       email       CountOf
---------- ----------- -----------
John       John-email  2
sam        sam-email   2

(2 row(s) affected)

if you want the IDs of the dups use this:

SELECT
    y.id,y.name,y.email
    FROM @YourTable y
        INNER JOIN (SELECT
                        name,email, COUNT(*) AS CountOf
                        FROM @YourTable
                        GROUP BY name,email
                        HAVING COUNT(*)>1
                    ) dt ON y.name=dt.name AND y.email=dt.email

OUTPUT:

id          name       email
----------- ---------- ------------
1           John       John-email
2           John       John-email
5           sam        sam-email
6           sam        sam-email

(4 row(s) affected)

to delete the duplicates try:

DELETE d
    FROM @YourTable d
        INNER JOIN (SELECT
                        y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
                        FROM @YourTable y
                            INNER JOIN (SELECT
                                            name,email, COUNT(*) AS CountOf
                                            FROM @YourTable
                                            GROUP BY name,email
                                            HAVING COUNT(*)>1
                                        ) dt ON y.name=dt.name AND y.email=dt.email
                   ) dt2 ON d.id=dt2.id
        WHERE dt2.RowRank!=1
SELECT * FROM @YourTable

OUTPUT:

id          name       email
----------- ---------- --------------
1           John       John-email
3           fred       John-email
4           fred       fred-email
5           sam        sam-email

(4 row(s) affected)

1

  • * Table names are case sensitivearray(3) { [0]=> string(5) “42000” [1]=> int(1064) [2]=> string(226) “You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘(PARTITION BY y.employee_id, y.leave_type_id ) AS RowRank ‘ at line 1” }

    – JAMZAD

    Dec 23, 2020 at 10:53

147

Try this:

SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )

1

  • perfect. thanks!

    Dec 6, 2021 at 4:39