25. What is a join?

So far, we've been working with queries where each individual query or subquery only includes one table in the from clause. But what if we need to get data out of multiple tables at the same time, or express relationships between tables that are more complex than allowed by in and exists? This is where the SQL join comes in.

join connects two tables in a database according to criteria that you specify. Your database will read every single table listed in your from clause (connected by join statements) and combine them into one huge, singular dataset, from which you can select things. Joins tell the database how to merge the tables to arrive at a single set of data to query from. There are several types of joins, but in practice there are only three that will serve 99.9% of your join needs:

  • inner join: returns all rows from two tables where the rows have some type of matching condition in common
  • left join (technically a left outer join): returns all rows from one table plus some subset of rows from a second table if some matching condition is met
  • cross join: returns all unique combinations of rows from two tables

Inner Join

The inner join takes two tables, specifies some set of criteria to link them (e.g., by account_id), and instructs the database to return all the rows where the linking criteria are present in both tables. If you consider two tables, A and B, the inner join returns rows where they have information in common, like the blue shaded area in this venn diagram:

Inner join venn diagram

For example, let's say we wanted to get the name and mailing address for all donors with an address in Mississippi. The names are in the accounts table, while the addresses are in the addresses table. We can join these tables to get the data we need, like this:

No Results

Let's examine the inner join in detail. Notice that it:

  • Comes between two table names. This instructs the database to join the accounts as accts table to the addresses as addr table.
  • Uses on to specify which column values must match for rows from the two tables to be joined together.
  • Makes columns from both tables available in the select and where clauses.

This inner join tells the database, "Take every row in the accounts table and every row in the addresses table, and try to match them up using account_id. If there's a match on account_id, make this data available to the rest of my query. If there's no match, don't make the data available to this query."

The next chapter will provide more information about inner join.

Exercise 1: Inner join

Your direct response team needs a list of transactions from 2022 that includes the donor's ID, transaction date, amount, source code, and source code channel. Use inner join to build this list.

No Results

Left Join

The left join (which is a type of outer join) takes two tables, specifies some set of criteria to link them (e.g., by account_id), and instructs the database to return all rows from the left table plus matching rows from the right table. Left and right? Adam, what are you talking about? Consider this example:

select
    a.account_id
    ,a.first_name
    ,a.last_name
    ,e.email_address
from accounts as a left join emails as e on a.account_id = e.account_id

In our from clause, we have two tables, accounts and emails, and accounts is on the left of the words left join and emails is on the right of the words left join. That's it – left means "the table to the left". In this example, every row from accounts will be made available to our query (since it's on the left), and any rows in the emails table that match to accounts on account_id will also be available. But any rows in emails that don't have a match for account_id in accounts won't be available.

Put another way, if you consider two tables A and B, the left join returns all rows from table A (on the left - accounts) and any matching rows from the table B (on the right - emails):

Left join venn diagram

We will break this down in more detail in a later section, but to get a sense of what this code does, let's look at what happens when we execute the query.

No Results

I'd like to highlight several things about the query and the results:

  • Every record from the accounts table is in our results, as are all the account IDs and first and last names we asked for.
  • Some of the rows in our results have an email address listed, while others do not. This is because not every donor in our database has an email address.
  • However, even those donors who don't have an email address are still present in our query. This is because the accounts table is on the left, so all records from that table are included, even if they have no match in the table on the right.

Let's compare this to what happens if we had used an inner join instead:

No Results

Two things are different:

  • We only have 500 pages of results in this query instead of 1,000 pages. This is because inner join only returns records that have a match in both tables.
  • Every record in our new output has an email address. Accounts who do not have an email address on file are removed by the inner join since they don't have a record in our emails table.

Left joins are one of the most difficult concepts for beginners to get a handle on, and we will cover them in much more depth, including with some exercises, in a coming chapter.

Right Join

The right join (technically a right outer join) is the logical equivalent of the left join, it just flips which table is the "main one". This type of join returns all values from the table on the right, while returning only matching rows for the table on the left. In practice, right join is virtually never used. Every right join can be rewritten as an equivalent left join. We will not be covering right join in this course. Just know that they exist and are just a left join in reverse.

Cross Join

A cross join generates a dataset comprised of all possible combinations of rows from two tables. This means that the resultset you generate could be extremely large. For example, if you cross join a table with 10,000 rows to a second table that has 15,000 rows, your results will contain 10000 x 15000 = 150,000,000 rows. Cross joins:

  • Do not contain join criteria, so there is no on keyword. All rows from both tables are included in the results.
  • Generate a result set that is the product of the row counts of the two tables involved. Cross joining a 10 row table to a 6 row table will result in 60 rows.

In fundraising, cross join is generally used in niche scenarios. For example:

  • Generating a list of all possible source codes from some list of source code position definitions
  • Appending a literal value to a table or set of tables temporarily as a new column
  • In place of loops for database systems that lack a procedural language, when needing to loop over a set number of iterations or through specific date ranges

Because their utility is limited and the risk of writing a query that consumes a large amount of resources is high, we will not be covering cross join in detail in this course.