28. Left Join
The left join
(which is a type of outer join) takes two tables, specifies some set of criteria to link them (e.g., by account_id
), and instructs the database to return all rows from the left table plus matching rows from the right table. In venn diagram terms, a left join
looks like this:
With the join supplying all data from table A (on the left) and only matching records from table B (on the right) – the area highlighted in blue. Let's return to our comparison between left join
and inner join
.
A Tale of Two Joins
Let's say you've been asked to generate a list of every donor in your database that includes:
- Account ID
- First Name
- Last Name
- Email Address
Every donor in the database has an account ID and first and last name, but only about half of the donors have an email address on file. We can verify that by running a few simple queries:
You can see that there are exactly 10,000 account records in our database. Now, let's look at the emails
table:
There are only 5,000 unique account IDs that are in our emails
table. Let's try and generate our list using the simplest tool available to us: the inner join
.
Two things stand out as wrong here:
- Every record in our results has an email address, which we know can't be the case since only 5,000 or our 10,000 donors have an email address on file.
- There are only 500 pages of results each with 10 records, meaning we only have 5,000 results instead of the expected 10,000.
Why didn't we get all 10,000 records from the accounts
table? Because inner join
will only return records where the join criteria are satisfied in both tables. Since 5,000 of our accounts do not have any entry whatsoever in the emails
table, those 5,000 records are not present in our results. This is the principal use case for a left join
: we want all records from one table with some column(s) from another table appended if there's relevant data there to append.
We can get exactly what we want by changing our inner join
to a left join
:
This query tells the database, "Give me the account ID, first name, and last name from the accounts table for all records even if they don't have an email address. Then, for any records where there's a match on account ID in the emails table, give me the email address." We can see that:
- Not every record in our results has an email address, which is correct since only half our donors have an email address on file.
- There are 1,000 pages of results each with 10 records, meaning we have the expected 10,000 total results (the same as the total number of records in
accounts
).
Exercise 1: Left Join
The membership department wants to analyze transactions from 2024 by whether the donor who made the gift has an email on file or not. To start this analysis, they have requested a list of all transactions in 2024 with the following columns:
- Account ID
- Transaction Date
- Transaction Amount
- Email address (if the donor has one)
Write an SQL query using left join
to generate this list.
Left Join select
Nuances
Exercise 1 above involves joining the transactions
table to the emails
table, both of which contain a field called account_id
. You may be wondering, "In the solution, why did we include the account_id
from the transactions
table in our select
clause, instead of the account_id
from the emails
table? Does it matter which one we use?" The answer is yes, it matters.
To see why, it's helpful to back up a bit and ask what happens if we just select *
instead of specifying any column names directly:
This is the raw result of our left join
operation: one big table that includes everything from the transactions
table and only matching data from the emails
table. There are a few important things to note about our results:
- Every column from both tables is included in our results.
- This means there are two columns for account_id (one of which has been auto-renamed as
account_id_1
by our SQL console). - Rows where email address is
NULL
also haveNULL
values foremail_address_id
andaccount_id_1
.
Now we can see why we are including the account_id
from our transactions
table in our select
clause: it's because we want the account_id
to be listed even if there is no match in the emails
table. If we had selected emails.account_id
instead, some of our account_id
values would be NULL
:
When using left join
, always think through which table the data in your select
clause is coming from. If you're running into issues, it can be helpful to try a select *
as we have done here to see what the result of your join looks like, which will point you in the right direction.
Advanced Left Joins
We've talked through the main use of the left join
, building a resultset that includes all data from one table plus appended columns from a second table if there's relevant data to display. The second principal use of left join
is to leverage the fact that when data is not present in the second table, the values are NULL
. This can help us write shorter, more performant code.
For example, let's say we wanted to generate a list of all account IDs in our database who do not have an email address on file. We know that we can use not exists
to do this:
But we can also do it using left join
like this:
Notice that these results are exactly the same. This works because after our left join
, we have one big table full of all the information from accounts
, plus information from emails
where it exists and NULL
where it does not (just like in our select *
example above that had NULL
account IDs). Our where
clause has access to this entire table, so we can use it to filter out data where the account_id
in our emails
table is NULL
(cases where there was no match). This pattern is often shorter and simpler to write.
Exercise 2: Advanced Left Join
Membership has completed their analysis of 2024 gifts and would like to do additional analysis that focuses only on gifts made by people who do not have an email address on file. Modify the query you wrote in Exercise 1 so that it only lists transactions where the donor has no email address, leveraging left join
NULL
values.
The last chapter in this section will provide several additional exercises to build confidence with left join
. Don't be discouraged if you're having a hard time with this topic; it's one where a lot of practice is required to build skills.