25. What is a join?
So far, we've been working with queries where each individual query or subquery only includes one table in the from
clause. But what if we need to get data out of multiple tables at the same time, or express relationships between tables that are more complex than allowed by in
and exists
? This is where the SQL join
comes in.
join
connects two tables in a database according to criteria that you specify. Your database will read every single table listed in your from
clause (connected by join
statements) and combine them into one huge, singular dataset, from which you can select things. Joins tell the database how to merge the tables to arrive at a single set of data to query from. There are several types of joins, but in practice there are only three that will serve 99.9% of your join needs:
inner join
: returns all rows from two tables where the rows have some type of matching condition in commonleft join
(technically aleft outer join
): returns all rows from one table plus some subset of rows from a second table if some matching condition is metcross join
: returns all unique combinations of rows from two tables
Inner Join
The inner join
takes two tables, specifies some set of criteria to link them (e.g., by account_id
), and instructs the database to return all the rows where the linking criteria are present in both tables. If you consider two tables, A and B, the inner join returns rows where they have information in common, like the blue shaded area in this venn diagram:
For example, let's say we wanted to get the name and mailing address for all donors with an address in Mississippi. The names are in the accounts table, while the addresses are in the addresses table. We can join these tables to get the data we need, like this:
Let's examine the inner join
in detail. Notice that it:
- Comes between two table names. This instructs the database to
join
theaccounts as accts
table to theaddresses as addr
table. - Uses
on
to specify which column values must match for rows from the two tables to be joined together. - Makes columns from both tables available in the
select
andwhere
clauses.
This inner join
tells the database, "Take every row in the accounts table and every row in the addresses table, and try to match them up using account_id. If there's a match on account_id, make this data available to the rest of my query. If there's no match, don't make the data available to this query."
The next chapter will provide more information about inner join
.
Exercise 1: Inner join
Your direct response team needs a list of transactions from 2022 that includes the donor's ID, transaction date, amount, source code, and source code channel. Use inner join
to build this list.
Left Join
The left join
(which is a type of outer join) takes two tables, specifies some set of criteria to link them (e.g., by account_id
), and instructs the database to return all rows from the left table plus matching rows from the right table. Left and right? Adam, what are you talking about? Consider this example:
select
a.account_id
,a.first_name
,a.last_name
,e.email_address
from accounts as a left join emails as e on a.account_id = e.account_id
In our from
clause, we have two tables, accounts and emails, and accounts is on the left of the words left join
and emails is on the right of the words left join
. That's it – left means "the table to the left". In this example, every row from accounts will be made available to our query (since it's on the left), and any rows in the emails table that match to accounts on account_id will also be available. But any rows in emails that don't have a match for account_id in accounts won't be available.
Put another way, if you consider two tables A and B, the left join returns all rows from table A (on the left - accounts) and any matching rows from the table B (on the right - emails):
We will break this down in more detail in a later section, but to get a sense of what this code does, let's look at what happens when we execute the query.
I'd like to highlight several things about the query and the results:
- Every record from the
accounts
table is in our results, as are all the account IDs and first and last names we asked for. - Some of the rows in our results have an email address listed, while others do not. This is because not every donor in our database has an email address.
- However, even those donors who don't have an email address are still present in our query. This is because the accounts table is on the left, so all records from that table are included, even if they have no match in the table on the right.
Let's compare this to what happens if we had used an inner join
instead:
Two things are different:
- We only have 500 pages of results in this query instead of 1,000 pages. This is because
inner join
only returns records that have a match in both tables. - Every record in our new output has an email address. Accounts who do not have an email address on file are removed by the
inner join
since they don't have a record in ouremails
table.
Left joins are one of the most difficult concepts for beginners to get a handle on, and we will cover them in much more depth, including with some exercises, in a coming chapter.
Right Join
The right join
(technically a right outer join
) is the logical equivalent of the left join
, it just flips which table is the "main one". This type of join returns all values from the table on the right, while returning only matching rows for the table on the left. In practice, right join
is virtually never used. Every right join
can be rewritten as an equivalent left join
. We will not be covering right join
in this course. Just know that they exist and are just a left join
in reverse.
Cross Join
Warning
Cross joins can be computationally expensive and you should use them only when you know exactly what you're doing and that data volumes are low enough for your database system to handle the operation.A cross join
generates a dataset comprised of all possible combinations of rows from two tables. This means that the resultset you generate could be extremely large. For example, if you cross join a table with 10,000 rows to a second table that has 15,000 rows, your results will contain 10000 x 15000 = 150,000,000 rows. Cross joins:
- Do not contain join criteria, so there is no
on
keyword. All rows from both tables are included in the results. - Generate a result set that is the product of the row counts of the two tables involved. Cross joining a 10 row table to a 6 row table will result in 60 rows.
In fundraising, cross join
is generally used in niche scenarios. For example:
- Generating a list of all possible source codes from some list of source code position definitions
- Appending a literal value to a table or set of tables temporarily as a new column
- In place of loops for database systems that lack a procedural language, when needing to loop over a set number of iterations or through specific date ranges
Because their utility is limited and the risk of writing a query that consumes a large amount of resources is high, we will not be covering cross join
in detail in this course.