27. Inner Join
The inner join
takes two tables, specifies some set of criteria to link them (e.g., by account_id
), and instructs the database to return all the rows where the linking criteria are present in both tables. Roughly half of the joins that I write in my work are inner join
, and you will find yourself turning to this powerful tool again and again.
In most fundraising CRMs and data warehouses, each table will contain a primary key that uniquely identifies each row. In the data we are using in this course, that's true for each of our five tables:
- transactions: transaction_id
- accounts: account_id
- source_codes: source_code_id
- addresses: address_id
- emails: email_address_id
You will usually (though not always) use inner join
to combine tables using these primary keys and their foreign key counterparts. As a refresher, a foreign key is a value in one table that corresponds to the primary key of another table. For example, our addresses
table contains account_id
, which is a foreign key that can be used to link data in addresses
to data in accounts
.
For example, let's say we wanted to build a list of all donor IDs and names in the database, along with their zip code. We could do this as follows:
select
accounts.account_id
,accounts.first_name
,accounts.last_name
,addresses.zip_code
from accounts inner join addresses on accounts.account_id = addresses.account_id
Our inner join
is comprised of two parts:
- Exactly two table names, in this case,
accounts
andaddresses
, linked by the keywordsinner join
. - A set of join criteria set off by the keyword
on
. Our example (and most joins) has only one criterion: that the account IDs are equal to each other. But you can useand
andor
to build complex criteria to join tables if needed.
This instructs the database to take all rows from the accounts
table and match them up to matching rows from the addresses
table as long as the join criteria are met (accounts.account_id = addresses.account_id
). If there were, for example, a record in the accounts
table that did not have an address in the addresses
table, that account would not show up in the results of this query, because the join condition would not be satisfied (there are no rows with a matching account_id
in addresses
).
Finally, note that it does not matter which order the tables are listed in. We could have written addresses inner join accounts
, which would be equivalent.
Once the database has completed our inner join
operation, all the rows and all the columns from both tables where a match is present are available for use in our where
and select
clauses. You can see that in our select
clause, we are pulling ID and name information from accounts
and the zip code information from addresses
.
Exercise 1: Basic inner join
Generate a list of all transaction_id values along with the source_code for the transaction using an inner join
.
All the Rows, All the Time
inner join
will stitch together all matching rows from the two tables, regardless of how many there are. Consider the relationship between the accounts
table and the transactions
table. Each account may have many transactions, which we term a "one to many" relationship between accounts
and transactions
(1 account -> many transactions). If we perform an inner join between these two tables, we will get a whole lot of rows:
Notice that we have a lot of records for Carla Gibson. Even though there's only one row in the accounts
table for her account_id
(437), there are 29 rows in the transactions
table that have this account_id
. inner join
will always return the full set of matching rows, so we've got 29 rows in our results for Carla – one for each transaction she's made.
Adding a where
clause
Just like any query, a query that uses inner join
can also feature a where
clause. We can build on our example above by considering only gifts made in 2002 by adding a simple where
clause:
The where
clause is evaluated after the join, meaning that the database first joins the tables together using your join criteria and only then applies the where
clause to the results of that join. With inner join
this distinction is generally not very important, but it will become important later so keep this rule in mind:
The database will resolve all join
conditions before evaluating the where
clause.
Aggregation
Continuing to build on our example with all gifts from 2002, we still see that Carla Gibson has made multiple gifts and thus shows up multiple times in our results. Sometimes, however, you will want to summarize this data using an aggregate function. What if instead of listing the details of each gift in 2002 (the transaction dates and individual amounts), we just wanted a list of donor ID, names, and total 2002 giving? We can use our friend sum()
to do this just as we would with any other query:
Now we see only a single row for Carla Gibson (and everyone else as well), with their cumulative giving in 2002. This query combines several concepts that we've covered so far: aggregation, date functions, ordering, aliasing, and joining tables.
Exercise 2: Aggregation
Modify the query above so that in addition to providing the sum of all giving in 2002, the query includes the count of gifts in 2002 as well as each donor's average gift in 2002.
Exercise 3: More Joins
Membership is looking to make a push in states with low revenue in 2023. To do this, they need a list of total 2023 revenue by state, sorted from lowest amount to highest amount. Write an SQL query using inner join
to generate this listing.
Exercise 4: Even More Joins
The board is evaluating investments for the upcoming fiscal year and wants to know how much money has been generated by each channel in 2022, 2023, and 2024. Write an SQL query using inner join
to generate this listing.