Course
Handling Duplicates
MySQL Tutorial
This SQL tutorial is structured for beginners to guide them from the foundational concepts to advanced data manipulation and querying techniques in SQL. By the end of this tutorial, you will have developed a robust understanding of SQL, equipping you with the knowledge to efficiently manage and analyze data across various database systems. This guide sets the stage for your growth into a skilled data professional, ready to tackle complex data challenges and contribute to the field of data analysis and database management.
Handling Duplicates
Tables or result sets in a database usually contain duplicate records. While duplicates are generally allowed, there are situations where it is necessary to prevent them. In such cases, it becomes essential to identify and remove duplicate records from a database table.
Importance of Handling MySQL Duplicates
There are various reasons why handling duplicates in a database becomes necessary. One of the main reasons is that the existence of duplicates in an organizational database will lead to logical errors. In addition to it, we need to handle redundant data to prevent the following consequences −
- Duplicate data occupies storage space, reducing the efficiency of database usage and increasing storage costs.
- Dealing with duplicate records consumes additional resources, driving up the overall cost of maintaining the database.
- Duplicates in a database can lead to logical errors in data, affecting the integrity and reliability of the information stored.
Preventing Duplicate Entries
You can use a PRIMARY KEY or a UNIQUE Index on a table with the appropriate fields to prevent duplicate record entries into a table.
Example
The following table contains no such index or primary key, so it would allow duplicate records for first_name and last_name.
CREATE TABLE CUSTOMERS ( first_name CHAR(20), last_name CHAR(20), sex CHAR(10));
To prevent multiple records with the same first and last name values from being created in this table, add a PRIMARY KEY to its definition. When you do this, it is also necessary to declare the indexed columns to be NOT NULL, because a PRIMARY KEY does not allow NULL values
CREATE TABLE CUSTOMERS ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name));
Using INSERT IGNORE Query
The existence of a unique index in a table normally causes an error when attempting to insert a record that duplicates an existing record in the indexed column(s).
To handle this situation without generating an error, you can use the "INSERT IGNORE" command. When a record is not a duplicate, MySQL inserts it as usual. However, if the record is duplicate, the "IGNORE" keyword instructs MySQL to discard it without producing an error.
The provided example does not result in an error, and it also ensures that duplicate records are not inserted
INSERT IGNORE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) VALUES ('Jay', 'Thomas'), ('Jay', 'Thomas');
We get the following output
Query OK, 1 row affected, 1 warning (0.01 sec)Records: 2 Duplicates: 1 Warnings: 1
Using REPLACE Query
Instead of using the INSERT command, consider using the REPLACE command. When dealing with a new record, it is inserted just as with INSERT. However, if it is a duplicate, the new record replaces the old one.
REPLACE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) VALUES ( 'Ajay', 'Kumar'), ( 'Ajay', 'Kumar');
Following is the output of the above code
Query OK, 2 rows affected (0.01 sec)Records: 2 Duplicates: 0 Warnings: 0
Your choice between the INSERT IGNORE and REPLACE commands should depend on the specific duplicate-handling behaviour you wish to achieve. The INSERT IGNORE command retains the first set of duplicated records and discards the remaining. On the other hand, the REPLACE command keeps the last set of duplicates and removes any earlier instances.
Using UNIQUE Constraint
Another way to enforce uniqueness is to add a UNIQUE index rather than a PRIMARY KEY to a table
CREATE TABLE CUSTOMERS ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), UNIQUE (last_name, first_name));
Counting and Identifying Duplicates
You can use the COUNT function and GROUP BY clause to count and identify duplicate records based on specific columns.
Example
Following is the query to count duplicate records with first_name and last_name in a table
SELECT COUNT(*) as repetitions, last_name, first_nameFROM CUSTOMERSGROUP BY last_name, first_nameHAVING repetitions > 1;
This query will return a list of all the duplicate records in the CUSTOMERS table. In general, to identify sets of values that are duplicated, follow the steps given below.
- Determine which columns may contain duplicated values.
- Include those columns in the column selection list, along with COUNT(*).
- List the columns in the GROUP BY clause as well.
- Apply a HAVING clause to filter unique values by requiring the group counts to be greater than one.
Eliminating Duplicates from a Query Result
You can use the DISTINCT command along with the SELECT statement to find out unique records available in a table.
SELECT DISTINCT last_name, first_nameFROM CUSTOMERSORDER BY last_name;
An alternative to the DISTINCT command is to add a GROUP BY clause that specifies the columns you are selecting. This approach eliminates duplicates and retrieves only the unique combinations of values from the specified columns.
SELECT last_name, first_nameFROM CUSTOMERSGROUP BY (last_name, first_name);
Removing Duplicates Using Table Replacement
If you have duplicate records in a table and you want to remove all the duplicate records from that table, then follow the procedure given below
CREATE TABLE tmp AS SELECT DISTINCT last_name, first_name, sexFROM CUSTOMERS;
DROP TABLE CUSTOMERS;ALTER TABLE tmp RENAME TO CUSTOMERS;
Handling Duplicates Using a Client Program
We can also handle duplicate using the client program.
Syntax
PHPNodeJSJavaPython
To handle duplicates value through a PHP program, we need to execute the "INSERT IGNORE" statement using the mysqli function query() as follows
$sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";$mysqli->query($sql);
Example
Following are the programs
PHPNodeJSJavaPython
$dbhost = 'localhost';$dbuser = 'root';$dbpass = 'password';$db = 'TUTORIALS';$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $db);if ($mysqli->connect_errno) { printf("Connect failed: %s", $mysqli->connect_error); exit();}//printf('Connected successfully.');$sql = "CREATE TABLE person_tbl (first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))";if($mysqli->query($sql)){ printf("Table created successfully...!\n");}//let's insert some records$sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";if($mysqli->query($sql)){ printf("First record inserted successfully...!\n");}$sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')";if($mysqli->query($sql)){ printf("Second record inserted successfully...!\n");}//now lets insert duplicate record with IGNORE keyword$sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";if($mysqli->query($sql)){ printf("Duplicate record inserted successfully using IGNORE keyword...!\n");}$sql = "SELECT * from person_tbl";if($result = $mysqli->query($sql)){ printf("Table records: \n"); while($row = mysqli_fetch_array($result)){ printf("First Name: %s, Last name: %s, Sex: %s", $row['first_name'], $row['last_name'], $row['sex']); printf("\n"); }}//lets insert a duplicate record$sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')";if(!$mysqli->query($sql)){ printf("You can't insert any duplicate records...!\n");}if($mysqli->error){ printf("Error message: ", $mysqli->error);}$mysqli->close();
Output
The output obtained is as shown below
Table created successfully...!First record inserted successfully...!Second record inserted successfully...!Duplicate record inserted successfully using IGNORE keyword...!Table records:First Name: Thomas, Last name: Jay, Sex:First Name: Smith, Last name: John, Sex:PHP Fatal error: Uncaught mysqli_sql_exception: Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY' in D:\test\handlingduplicates.php:48