24. Having

having, like group by, is not an aggregate function itself but it is used along with aggregate functions to filter your results. Recall from our introduction that aggregates cannot be used in a where clause. But there are a lot of times where it would be useful to be able to use them there. having is a special clause that can be used in queries that contain aggregate functions that act as a post-execution filter for your results. having is the distant cousin of where.

Let's look at an example. One common use case for aggregate functions is identifying duplicates in a table. The query below looks at our accounts table and produces the count of records that share a common first and last name:

No Results

This query shows us that, for example, there is only one record in our system with the name "Aaron Barker", only one "Aaron Cordova", and only one "Aaron Evans". But if we're looking for duplicate records, we don't care about names that appear once; we want to see names that appear more than once. This is where having comes in. We place the having clause after the group by to specify that we only want to see records where count(*) > 1, like this:

No Results

Notice that in these results, every single row has number_of_records greater than one – they are all names that are duplicated in our database.

A note of caution

I said above that having is the distant cousin of where, and it's true, but it's important to know that the database will execute these statements at different times in the query execution process. Your where clause is evaluated immediately after the from clause. It filters records before aggregation takes place. In contrast, the having clause is evaluated only after the group by clause is complete. It filters records after aggregation takes place.

  • where limits the rows that are eligible to be aggregated
  • having limits the output after rows are already aggregated

So think of having as just "making your job easier" in the output but not changing the way the query actually functions. It's a post-query filter, not something that changes the way your database interacts with the underlying data. In our example, having just strips out things we don't want to see (records where the count is 1). It doesn't change the work the database has to do, because it sill needs to count up the number of "Aaron Barker" records in order to conclude that the count is 1.

Exercise 1: Having

Membership wants a report of all states that have 100 or more donors living in them. The report should include:

  • The state abbreviation
  • The number of donors in the state

Write an SQL query using count(*) and having to generate this list.

No Results

Exercise 2: Cumulative Giving

The examples above have all used count(), but having will work with any aggregate function like sum() and max(). In this exercise, development wants a list of all donors whose lifetime cumulative giving is $10,000 or more. The output should include:

  • Donor ID
  • Cumulative Lifetime Giving

Write an SQL query to generate this list.

No Results

Exercise 3: Advanced having

So far we have seen several examples of having with simple criteria. But because SQL is so flexible, we can easily add more complex criteria using and and or to stack different pieces of logic into our having statement. In this exercise, Development has come back to you and asked for a slightly different list: all donors with $10,000 cumulative lifetime giving whose lifetime average gift is at least $5,000. Output should include:

  • Donor ID
  • Cumulative Lifetime Giving
  • Lifetime Average Gift

Write an SQL query using having to generate this list.

No Results

having Without Selected Aggregates

In our section on group by, I mentioned that you can use group by in queries that do not have any aggregate functions in the select clause. While the use cases for this design pattern are rare, it is somewhat common in conjunction with having. Often, you will want to generate some list for which group by and aggregate functions are helpful, but you don't want the aggregated column in your output.

For example, let's go back to our example of identifying names that appear multiple times in our database. In that example, we generated a list that included:

  • First Name
  • Last Name
  • Number of Records

But what if we wanted that same list of duplicated names, but we don't actually care how many times the names appear as long as it's more than once. It could be 2 times or 20 times – we don't care – we just want a bare list of duplicated names. We can do that using having and group by without putting the aggregate function in our select clause, like this:

No Results

This query generates our same list of duplicated names and still uses the aggregate function count(*) in the having clause, but does not include count(*) in the select clause. This query shows us that aggregates in the having clause do not need to appear in the select clause.

Exercise 4: Unselected Aggregates

Membership needs a list of all donors who gave 3 or more gifts in 2020. But the system they are importing this data to can only accept data with a single column: account_id. Write an SQL query to generate this list that outputs only the account_id.

No Results