24. Having
having
, like group by
, is not an aggregate function itself but it is used along with aggregate functions to filter your results. Recall from our introduction that aggregates cannot be used in a where
clause. But there are a lot of times where it would be useful to be able to use them there. having
is a special clause that can be used in queries that contain aggregate functions that act as a post-execution filter for your results. having
is the distant cousin of where
.
Let's look at an example. One common use case for aggregate functions is identifying duplicates in a table. The query below looks at our accounts
table and produces the count of records that share a common first and last name:
This query shows us that, for example, there is only one record in our system with the name "Aaron Barker", only one "Aaron Cordova", and only one "Aaron Evans". But if we're looking for duplicate records, we don't care about names that appear once; we want to see names that appear more than once. This is where having
comes in. We place the having
clause after the group by
to specify that we only want to see records where count(*) > 1
, like this:
Notice that in these results, every single row has number_of_records
greater than one – they are all names that are duplicated in our database.
A note of caution
I said above that having
is the distant cousin of where
, and it's true, but it's important to know that the database will execute these statements at different times in the query execution process. Your where
clause is evaluated immediately after the from
clause. It filters records before aggregation takes place. In contrast, the having
clause is evaluated only after the group by
clause is complete. It filters records after aggregation takes place.
where
limits the rows that are eligible to be aggregatedhaving
limits the output after rows are already aggregated
So think of having
as just "making your job easier" in the output but not changing the way the query actually functions. It's a post-query filter, not something that changes the way your database interacts with the underlying data. In our example, having
just strips out things we don't want to see (records where the count is 1). It doesn't change the work the database has to do, because it sill needs to count up the number of "Aaron Barker" records in order to conclude that the count is 1.
Exercise 1: Having
Membership wants a report of all states that have 100 or more donors living in them. The report should include:
- The state abbreviation
- The number of donors in the state
Write an SQL query using count(*)
and having
to generate this list.
Exercise 2: Cumulative Giving
The examples above have all used count()
, but having
will work with any aggregate function like sum()
and max()
. In this exercise, development wants a list of all donors whose lifetime cumulative giving is $10,000 or more. The output should include:
- Donor ID
- Cumulative Lifetime Giving
Write an SQL query to generate this list.
Exercise 3: Advanced having
So far we have seen several examples of having
with simple criteria. But because SQL is so flexible, we can easily add more complex criteria using and
and or
to stack different pieces of logic into our having
statement. In this exercise, Development has come back to you and asked for a slightly different list: all donors with $10,000 cumulative lifetime giving whose lifetime average gift is at least $5,000. Output should include:
- Donor ID
- Cumulative Lifetime Giving
- Lifetime Average Gift
Write an SQL query using having
to generate this list.
having
Without Selected Aggregates
In our section on group by, I mentioned that you can use group by
in queries that do not have any aggregate functions in the select
clause. While the use cases for this design pattern are rare, it is somewhat common in conjunction with having
. Often, you will want to generate some list for which group by
and aggregate functions are helpful, but you don't want the aggregated column in your output.
For example, let's go back to our example of identifying names that appear multiple times in our database. In that example, we generated a list that included:
- First Name
- Last Name
- Number of Records
But what if we wanted that same list of duplicated names, but we don't actually care how many times the names appear as long as it's more than once. It could be 2 times or 20 times – we don't care – we just want a bare list of duplicated names. We can do that using having
and group by
without putting the aggregate function in our select
clause, like this:
This query generates our same list of duplicated names and still uses the aggregate function count(*)
in the having
clause, but does not include count(*)
in the select
clause. This query shows us that aggregates in the having
clause do not need to appear in the select
clause.
Exercise 4: Unselected Aggregates
Membership needs a list of all donors who gave 3 or more gifts in 2020. But the system they are importing this data to can only accept data with a single column: account_id. Write an SQL query to generate this list that outputs only the account_id.