It is very common requirement to compare two or more tables in SQL. You will get this requirement in your day to day tasks. For example, check if tables in same server are matching or is there any discrepancies between the table’s data.
Following are the some of the common methods that you can use to compare two tables in any SQL database. You can use most of these methods on BigData technologies such as Apache Hive and Apache Spark.
- Compare Two Table in SQL using MINUS
- Compare Two Tables in SQL using UNION ALL
- Compare Cells From Two Tables in SQL
- Compare Two Table in SQL using JOIN
- Compare Two Table in SQL using NOT EXISTS
Compare Two Table in SQL using MINUS
You can use MINUS SQL set operator to compare two tables. You can compare the two similar tables or data sets using MINUS operator. It returns all rows from first table those are not exist or changed in the second table.
For example,
Select id_pk, col1, col2...,coln from tableA
MINUS
Select id_pk, col1, col2...,coln from tableB;
Using MINUS operator, you can quickly check how many records are having mismatching between two tables.
One of the drawbacks with using MINUS is that, the tables must have the same number of columns and the data types must match. If there is a data type mismatch, you should use explicit type casting.
Compare Two Tables in SQL using UNION ALL
The UNION set operator allows you to compare data from two similar tables or data sets. The main advantage of UNION operator is that it can handle null values from both tables that JOIN or WHERE clause does not handle.
For example,
Select * from (
Select id_pk, col1, col2...,coln from table1, ‘first_table’
Union all
Select id_pk, col1, col2...,coln from table2, 'second_table'
) tmp order by id_pk;
The above query returns the all rows from both tables as ‘first_table’ and ‘second_table’. You can quickly verify the differences between two tables.
One of the drawbacks with using UNION ALL is that, the tables must have the same number of columns and the data types must match. If there is a data type mismatch, you should use explicit type casting.
Compare Cells From Two Tables in SQL
You can use the CASE statement to compare records from both tables and check if the cells are matching from both tables. Note that, you should have a primary key or unique key in both tables so that you can use that in your joining condition.
For example, following example compares cells from both tables.
SELECT CASE
WHEN Nvl(a.id :: VARCHAR, 'NULL' :: VARCHAR) = Nvl (b.id :: VARCHAR,'NULL' :: VARCHAR) THEN
'Matched'
ELSE Nvl(a.id :: VARCHAR, 'NULL' :: VARCHAR)
||','
|| Nvl(b.id :: VARCHAR, 'NULL' :: VARCHAR)
END id,
CASE
WHEN Nvl(a.name :: VARCHAR, 'NULL' :: VARCHAR) =
Nvl (b.name :: VARCHAR,'NULL' :: VARCHAR) THEN
'Matched'
ELSE Nvl(a.name :: VARCHAR, 'NULL' :: VARCHAR)
||','
|| Nvl (b.name :: VARCHAR, 'NULL' :: VARCHAR)
END name,
CASE
WHEN Nvl(a.addr :: VARCHAR, 'NULL' :: VARCHAR) =
Nvl (b.addr :: VARCHAR,'NULL' :: VARCHAR) THEN
'Matched'
ELSE Nvl(a.addr :: VARCHAR, 'NULL' :: VARCHAR)
||','
|| Nvl (b.addr :: VARCHAR, 'NULL' :: VARCHAR)
END addr
FROM test_tab1 a
full outer join test_tab2 b
ON Nvl(a.id :: VARCHAR, 'NULL' :: VARCHAR) =
Nvl (b.id :: VARCHAR, 'NULL' :: VARCHAR) ;
+---------+----------+------------+
| ID | NAME | ADDR |
|---------+----------+------------|
| Matched | Matched | Matched |
| Matched | Matched | Matched |
| Matched | Matched | adr3,adr- |
| Matched | ddd,dd | Matched |
| Matched | eee,NULL | adr5,NULL |
| NULL,6 | NULL,zzz | NULL,zaddr |
| 7,NULL | yyy,NULL | yaddr,NULL |
+---------+----------+------------+
Above SQL will return the output as ‘Matched‘ if cell from both tables are matching. Otherwise, returns unmatched records as a comma separated value. Note that, not all databases support NVL function to handle NULL values. You can use COALESCE instead or database specific function to handle null.
You can also use LEFT OUTER JOIN or INNER JOIN if you know the table count is matching.
Related Articles,
Compare Two Table in SQL using JOIN
One of the easiest method to compare tables is using JOIN. This method similar to that of the previous method. Similar to cell by cell comparison method, you can join the two tables on the primary key of the two tables and use SQL CASE statement to check whether particular column is matching between two tables.
For example,
Select case when A.col1 = B.col1 then ‘Match’ else ‘Mismatch’ end as col1_cmpr,
case when A.col2 = B.col2 then ‘Match’ else ‘Mismatch’ end as col2_cmpr, …..
from tableA A
Join tableB B
on (A.id_pk = B.id_pk) ;
One of the drawbacks of this method is that it will not compare null values. However, you can use null handling functions such as NVL to handle null values during comparison.
Compare Two Table in SQL using NOT EXISTS
The other quick method is to use the NOT EXISTS in the WHERE clause of the query. This method is faster and performs well on the large volume of tables. This is one of the recommended methods if you are comparing two large tables.
Select id_pk, col1, col2,col,…
From tableA A
Where NOT EXISTS
( select 1 from tableB B
Where A.id_pk = B.id_pk
and A.col1 = B.col1
and A.col2 = B.col2
and…
);
Similar to the previous method, in this approach also you have to use the NVL on the columns which contains a NULL in it. This method works well with all databases where correlated subqueries are supported.
Related Articles,
- Why You Should Learn Snowflake? Complete Features
- How to Find String in Spark DataFrame? – Scala and PySpark
- How to Replace Spark DataFrame Column Value? – Scala and PySpark
- Working with External Tables in Amazon Redshift?
- Redshift NVL and NVL2 Functions – Syntax and Examples
- Replace Function in Snowflake – Usage and Examples
Hope this helps 🙂