Finding duplicate rows in the database
{{
Information gleaned from Microsoft KB article: How to remove duplicate rows from a table in SQL Server
When you allow duplicated rows in a database that aren't expected and shouldn't be allowed, it's a flag saying that you need a unique index on the table to prevent duplicate rows to be added in the first place.
The first step to fixing the problem is to find and fix the duplicated rows.
The second step is to add an index once the existing duplicate rows have been dealt with, so that the problem won't occur in the future.
If you have a Users
table, which has an Email
column, you will likely want that Email
column to have a unique index. You can find the duplicated emails (and the number of times each email occurs) using this query:
SELECT Email, COUNT(Email) AS NumberOfDuplicates FROM `Users` GROUP BY Email HAVING ( COUNT(Email) > 1 )
Which may return something like this:
NumberOfDuplicates | |
---|---|
joe@bloggs.com |
3 |
spammer@spamilicious.com |
13 |
etc |
Now you can go about carefully resolving the duplicates, then adding the constraints to that column to prevent duplicates occurring again.