12. Exists and Not Exists
We've left the shallow end behind and are starting to get into deeper waters now. Our next operator is the exists
operator, one of the most useful operators in SQL. exists
allows you to test whether a subquery returns any rows. That probably sounds confusing and not helpful, let's look at an example.
Let's say you want to find a list of all transactions to the "Trees" fund where the donor who gave the gift has previously given at least once to the "Birds" fund. Breaking this task down, we want to do this by:
- Building a list of gifts to the "Trees" fund.
- For each gift in that list, check whether the donor who made it has given a gift with an earlier transaction date to the "Birds" fund.
This is a great use case for exists
. We are asking the database, "For each of these Trees gifts, does there exist a prior gift from the same donor to the Birds fund?" Our SQL might look like this:
select
*
from transactions as trees
where trees.Fund = 'Trees'
and exists (
select 'X'
from transactions as birds
where birds.account_id = trees.account_id
and birds.transaction_date < trees.transaction_date
and birds.Fund = 'Birds'
)
There's a lot of new stuff here, so let's unpack it piece by piece. First, we're using a table alias, trees
for our transactions table. In this case, the table alias is mandatory because we are also going to reference the same table in our exists
subquery, and we want to be able to tell the database which of our where
clause elements should come from each table.
Second, our subquery looks a bit strange. We're again selecting from transactions which we have aliased as birds
. But what heck is select 'X'
? What's happening is that the database is going to test every single row in our trees
transactions table by running this subquery and checking to see if there's at least one one result of any kind. It doesn't matter what the result is. If a result exists, the subquery returns true
otherwise it returns false
. We just select a simple string value, X
. You can use select 1
or select 'this_awesome_string'
, or nearly anything you want. But by convention, people often use select 'X'
.
Finally, notice that we are testing a series of conditions between the two tables in our subquery where
clause. Inside the subquery, you can refer to any and all tables that are part of the main query's from
clause (so our subquery can "see" our trees
table). For each row in our trees
table, we want to consult birds
to determine:
- Is the
birds
transaction from the sameaccount_id
as ourtrees
transaction (birds.account_id = trees.account_id
)? - Was the
birds
transaction made before thetrees
transaction (birds.transaction_date < trees.transaction_date
)? - Was the
birds
transaction made to the "Birds" fund (birds.Fund = 'Birds'
)?
If all of these things are true, then such a gift exists, so our exists
statement returns true for that row. Otherwise, it returns false. The database then continues like this, checking each and every row in our trees
table using the same tests.
I know this is a lot and seems complicated. Don't worry, there's going to be ample opportunity to practice. The main thing I want you to understand right now is that we have a way in SQL to say "hey database, does some specific thing exist in a table I'm about to specify?" If you understand that we can do this in SQL, you're on the right track. Comfort will come with practice.
Exercise 1: Exists
You've been asked to identify all gifts from 2024 where the donor who gave the gift had previously given at least one gift of $100. Put another way, for each transaction in 2024, does there exist a transaction from the same donor with an earlier transaction date that is $100+?
Not Exists
Just like with in
and between
, exists
has an opposite: not exists
. It works much as you'd expect: it checks to see if a specific thing exists. If it does, it returns false, otherwise it returns true. For example, if I wanted to find everyone in the state of Minnesota who has never made a gift of $100 or more, I might write:
select *
from addresses as ad
where ad.state = 'MN'
and not exists (
select 'X'
from transactions as t
where t.account_id = ad.account_id
and t.amount >= 100
)
exists
and not exists
are equally useful. You will likely find yourself turning to them time and time again to identify donors who have/haven't done certain things or given certain types of gifts.
Exercise 2: Not exists
You've been asked to identify all gifts from 2024 where the donor who gave the gift was brand new. Put another way, for all gifts in 2024, find records where there does not exist a gift from the same person with an earlier transaction date.