Understanding UNION All vs UNION: How to Choose the Right Operator for Your SQL Query

Understanding the Problem and Query

The question at hand revolves around performing a specific type of join on two tables to aggregate data by person, team, client ID, and client. We are given two tables, table_1 and table_2, each containing columns for person, team, client ID, client, and time spent.

Table 1

PersonTeamClient IDClientTime Spent (h)
NoahMarketingECOM01Nike10
PeterMarketingECOM01Nike10

Table 2

PersonTeamClient IDClientTime Spent (h)
AlexCXECOM01Nike10
MaxCXECOM01Nike10

The question asks for a query that can produce the following result:

PersonTeamClient IDClientTime Spent (h)
NoahMarketingECOM01Nike10
PeterMarketingECOM01Nike10
AlexCXECOM01Nike10
MaxCXECOM01Nike10

The Problem with the Provided Query

The provided query is:

SELECT 
    M.Client_ID,
    M.Client,
    SUM(C.Time_spent_h) + SUM(M.Time_spent_h) AS Total_time
FROM
    (SELECT 
        Client_ID, Client, SUM(Time_spent_h) AS Time
    FROM
        table_1
    GROUP BY 1 , 2) AS M
    LEFT JOIN
    (SELECT 
        Client_ID, Client, SUM(Time_spent_h) AS Time
    FROM
        table_2
    GROUP BY 1 , 2) AS C ON C.ID = M.ID
GROUP BY 1 , 2

This query seems to be attempting a join operation between the two tables based on the Client_ID column. However, there are several issues with this approach:

  • The inner select queries are grouping by only three columns (Client_ID, Client, and Time_spent_h), but the outer join is expecting all four columns to match (Person, Team, Client_ID, Client). This will lead to incorrect results.
  • The use of LEFT JOIN instead of a proper merge or union operation.

A Correct Approach: Using UNION ALL

A more suitable approach would be to perform a union operation between the two tables, which can be achieved using the UNION ALL operator. However, this alone is not enough to solve the problem at hand.

The Correct Query

SELECT 
    Person,  Team, Client_ID, Client, SUM(Time_spent_h) AS Time
FROM 
    table_1
GROUP BY 1 , 2,3,4

UNION ALL

SELECT 
    Person,  Team, Client_ID, Client, SUM(Time_spent_h) AS Time
FROM 
    table_2
GROUP BY 1 , 2,3,4

This query will produce the desired result by grouping each row from both tables together based on the Person, Team, Client_ID, and Client columns. The UNION ALL operator ensures that all rows from both queries are included in the final result.

How UNION ALL Works

The UNION ALL operator combines the result sets of two or more SELECT statements into a single result set. Unlike UNION, which eliminates duplicate rows, UNION ALL preserves duplicates and includes them in the final result.

In the context of this problem, using UNION ALL allows us to group each row from both tables together based on the specified columns, ensuring that we get all the unique combinations of person, team, client ID, and client.

Understanding UNION All vs UNION

Both UNION and UNION ALL can be used in SQL queries. However, they have slightly different use cases:

  • UNION: Eliminates duplicate rows from the result sets before combining them.
  • UNION ALL: Preserves duplicates in the final result set.

In this problem, we chose to use UNION ALL because it allows us to preserve all unique combinations of person, team, client ID, and client from both tables.


Last modified on 2025-03-04