35. Union and Union All

The UNION and UNION ALL set operations are closely related operations that combine results from two queries into a single resultset. There's one distinction between these two operations:

  • UNION will combine the results of the two queries and then remove any duplicate rows.
  • UNION ALL will combine the results of the two queries and will not remove duplicate rows.

Let's look at a (contrived) example to illustrate the difference. In this first query, we are going to build a list of every account_id who has a transaction with one of these source_code_id values: 3113, 77225.

No Results

This query tells the database, "Build me a list of everyone who gave to source code 3113, then build me a second list of everyone who has given to source code 77225. Finally, combine these two lists into one and remove any duplicate rows." As you can see, we have 6 account_id values in our results with no duplicates. Now let's substitute in UNION ALL and see how our results change:

No Results

Now, we have seven account_id values in our results, and ID number 1487899 appears twice. This is because UNION ALL does not remove duplicates from the result set. That's the difference between these two operations. When using UNION, think carefully about whether your results could have duplicate rows and whether you do or do not want them in your results.

Union with Multiple Columns

The examples above have only included a single column in the underlying queries. But UNION and UNION ALL work with queries that select multiple columns, as long as:

  • There are the same number of columns in both queries, and
  • The columns are of compatible data types by position

Here are arrangements that would not work:

select
    account_id
from accounts
where first_name = 'Danielle'
UNION
select
    account_id
    ,city
from addresses
where state in ('KY','WV','TN')

In this example, the first query selects one column (account_id) while the second query selects both account_id and city. These queries cannot be the subject of a set operation, since they do not have the same number of columns. A second improperly formed set operation is this example:

select
    account_id
    ,amount
from transactions
UNION ALL
select
    account_id
    ,email_address
from emails

Here, the two queries each select two columns, but the second columns in the order (amount in the first query and email_address in the second) have different data types. amount is a numeric type, while email_address is text. These two queries cannot be the subject of a set operation, since they do not have compatible data types by column position.

Multiple Unions

Finally, note that you can perform set operations on as many queries as you'd like as long as they obey the rules outlined above. For example, we could use this (unrealistic and inefficient) query to select account_id values for everyone who lives in KY, TN, and WV:

select
    account_id
    ,state
from addresses
where state = 'KY'
UNION ALL
select
    account_id
    ,state
from addresses
where state = 'TN'
UNION ALL
select
    account_id
    ,state
from addresses
where state = 'WV'

The database will resolve the set operations in the order they are written. So here, it will first build the list of everyone with an address in Kentucky, then merge it with the list of everyone who lives in Tennessee, then merge that combined list with everyone who lives in West Virginia. In a UNION or UNION ALL operation, the order usually doesn't matter. But you can mix set operations in a single query, so you may have a query that uses both UNION and INTERSECT, in which case the order may become important.

Exercise 1: Union

Write an SQL query using UNION or UNION ALL to build a duplicate-free list of account_id values for everyone who has given a gift of at least $100 and has also given at least one gift to the sustainer program.

No Results

Exercise 2: More Union

You're in the early stages of pulling data for an upcoming direct marketing campaign. The team is evaluating two different sets of criteria to select donors for inclusion in the campaign and they want to see preliminary lists of donors selected by each set of criteria before making their decision. The first set of criteria is:

  • At least one gift of $1,000 or more via the Direct Mail channel

The second set of criteria is:

  • At least one gift of $100 or more to the Trees fund via the Direct Mail channel

Output should include:

  • Donor ID
  • A column that says either "Criteria 1" or "Criteria 2"

Write an SQL query using UNION or UNION ALL to generate this output.

No Results