28. Left Join

The left join (which is a type of outer join) takes two tables, specifies some set of criteria to link them (e.g., by account_id), and instructs the database to return all rows from the left table plus matching rows from the right table. In venn diagram terms, a left join looks like this:

Left join venn diagram

With the join supplying all data from table A (on the left) and only matching records from table B (on the right) – the area highlighted in blue. Let's return to our comparison between left join and inner join.

A Tale of Two Joins

Let's say you've been asked to generate a list of every donor in your database that includes:

  • Account ID
  • First Name
  • Last Name
  • Email Address

Every donor in the database has an account ID and first and last name, but only about half of the donors have an email address on file. We can verify that by running a few simple queries:

No Results

You can see that there are exactly 10,000 account records in our database. Now, let's look at the emails table:

No Results

There are only 5,000 unique account IDs that are in our emails table. Let's try and generate our list using the simplest tool available to us: the inner join.

No Results

Two things stand out as wrong here:

  1. Every record in our results has an email address, which we know can't be the case since only 5,000 or our 10,000 donors have an email address on file.
  2. There are only 500 pages of results each with 10 records, meaning we only have 5,000 results instead of the expected 10,000.

Why didn't we get all 10,000 records from the accounts table? Because inner join will only return records where the join criteria are satisfied in both tables. Since 5,000 of our accounts do not have any entry whatsoever in the emails table, those 5,000 records are not present in our results. This is the principal use case for a left join: we want all records from one table with some column(s) from another table appended if there's relevant data there to append.

We can get exactly what we want by changing our inner join to a left join:

No Results

This query tells the database, "Give me the account ID, first name, and last name from the accounts table for all records even if they don't have an email address. Then, for any records where there's a match on account ID in the emails table, give me the email address." We can see that:

  1. Not every record in our results has an email address, which is correct since only half our donors have an email address on file.
  2. There are 1,000 pages of results each with 10 records, meaning we have the expected 10,000 total results (the same as the total number of records in accounts).

Exercise 1: Left Join

The membership department wants to analyze transactions from 2024 by whether the donor who made the gift has an email on file or not. To start this analysis, they have requested a list of all transactions in 2024 with the following columns:

  • Account ID
  • Transaction Date
  • Transaction Amount
  • Email address (if the donor has one)

Write an SQL query using left join to generate this list.

No Results

Left Join select Nuances

Exercise 1 above involves joining the transactions table to the emails table, both of which contain a field called account_id. You may be wondering, "In the solution, why did we include the account_id from the transactions table in our select clause, instead of the account_id from the emails table? Does it matter which one we use?" The answer is yes, it matters.

To see why, it's helpful to back up a bit and ask what happens if we just select * instead of specifying any column names directly:

No Results

This is the raw result of our left join operation: one big table that includes everything from the transactions table and only matching data from the emails table. There are a few important things to note about our results:

  1. Every column from both tables is included in our results.
  2. This means there are two columns for account_id (one of which has been auto-renamed as account_id_1 by our SQL console).
  3. Rows where email address is NULL also have NULL values for email_address_id and account_id_1.

Now we can see why we are including the account_id from our transactions table in our select clause: it's because we want the account_id to be listed even if there is no match in the emails table. If we had selected emails.account_id instead, some of our account_id values would be NULL:

No Results

When using left join, always think through which table the data in your select clause is coming from. If you're running into issues, it can be helpful to try a select * as we have done here to see what the result of your join looks like, which will point you in the right direction.

Advanced Left Joins

We've talked through the main use of the left join, building a resultset that includes all data from one table plus appended columns from a second table if there's relevant data to display. The second principal use of left join is to leverage the fact that when data is not present in the second table, the values are NULL. This can help us write shorter, more performant code.

For example, let's say we wanted to generate a list of all account IDs in our database who do not have an email address on file. We know that we can use not exists to do this:

No Results

But we can also do it using left join like this:

No Results

Notice that these results are exactly the same. This works because after our left join, we have one big table full of all the information from accounts, plus information from emails where it exists and NULL where it does not (just like in our select * example above that had NULL account IDs). Our where clause has access to this entire table, so we can use it to filter out data where the account_id in our emails table is NULL (cases where there was no match). This pattern is often shorter and simpler to write.

Exercise 2: Advanced Left Join

Membership has completed their analysis of 2024 gifts and would like to do additional analysis that focuses only on gifts made by people who do not have an email address on file. Modify the query you wrote in Exercise 1 so that it only lists transactions where the donor has no email address, leveraging left join NULL values.

No Results

The last chapter in this section will provide several additional exercises to build confidence with left join. Don't be discouraged if you're having a hard time with this topic; it's one where a lot of practice is required to build skills.