24. Having
having, like group by, is not an aggregate function itself but it is used along with aggregate functions to filter your results. Recall from our introduction that aggregates cannot be used in a where clause. But there are a lot of times where it would be useful to be able to use them there. having is a special clause that can be used in queries that contain aggregate functions that act as a post-execution filter for your results. having is the distant cousin of where.
Let's look at an example. One common use case for aggregate functions is identifying duplicates in a table. The query below looks at our accounts table and produces the count of records that share a common first and last name:
This query shows us that, for example, there is only one record in our system with the name "Aaron Barker", only one "Aaron Cordova", and only one "Aaron Evans". But if we're looking for duplicate records, we don't care about names that appear once; we want to see names that appear more than once. This is where having comes in. We place the having clause after the group by to specify that we only want to see records where count(*) > 1, like this:
Notice that in these results, every single row has number_of_records greater than one – they are all names that are duplicated in our database.
A note of caution
I said above that having is the distant cousin of where, and it's true, but it's important to know that the database will execute these statements at different times in the query execution process. Your where clause is evaluated immediately after the from clause. It filters records before aggregation takes place. In contrast, the having clause is evaluated only after the group by clause is complete. It filters records after aggregation takes place.
wherelimits the rows that are eligible to be aggregatedhavinglimits the output after rows are already aggregated
So think of having as just "making your job easier" in the output but not changing the way the query actually functions. It's a post-query filter, not something that changes the way your database interacts with the underlying data. In our example, having just strips out things we don't want to see (records where the count is 1). It doesn't change the work the database has to do, because it sill needs to count up the number of "Aaron Barker" records in order to conclude that the count is 1.
Exercise 1: Having
Membership wants a report of all states that have 100 or more donors living in them. The report should include:
- The state abbreviation
- The number of donors in the state
Write an SQL query using count(*) and having to generate this list.
Exercise 2: Cumulative Giving
The examples above have all used count(), but having will work with any aggregate function like sum() and max(). In this exercise, development wants a list of all donors whose lifetime cumulative giving is $10,000 or more. The output should include:
- Donor ID
- Cumulative Lifetime Giving
Write an SQL query to generate this list.
Exercise 3: Advanced having
So far we have seen several examples of having with simple criteria. But because SQL is so flexible, we can easily add more complex criteria using and and or to stack different pieces of logic into our having statement. In this exercise, Development has come back to you and asked for a slightly different list: all donors with $10,000 cumulative lifetime giving whose lifetime average gift is at least $5,000. Output should include:
- Donor ID
- Cumulative Lifetime Giving
- Lifetime Average Gift
Write an SQL query using having to generate this list.
having Without Selected Aggregates
In our section on group by, I mentioned that you can use group by in queries that do not have any aggregate functions in the select clause. While the use cases for this design pattern are rare, it is somewhat common in conjunction with having. Often, you will want to generate some list for which group by and aggregate functions are helpful, but you don't want the aggregated column in your output.
For example, let's go back to our example of identifying names that appear multiple times in our database. In that example, we generated a list that included:
- First Name
- Last Name
- Number of Records
But what if we wanted that same list of duplicated names, but we don't actually care how many times the names appear as long as it's more than once. It could be 2 times or 20 times – we don't care – we just want a bare list of duplicated names. We can do that using having and group by without putting the aggregate function in our select clause, like this:
This query generates our same list of duplicated names and still uses the aggregate function count(*) in the having clause, but does not include count(*) in the select clause. This query shows us that aggregates in the having clause do not need to appear in the select clause.
Exercise 4: Unselected Aggregates
Membership needs a list of all donors who gave 3 or more gifts in 2020. But the system they are importing this data to can only accept data with a single column: account_id. Write an SQL query to generate this list that outputs only the account_id.