23. count()

The count() aggregate function does what it sounds like: it counts how many times non-null values appear in a given column or expression. In daily life, we tend to think of counting things as easier than summing, averaging, or determining highest/lowest values. But in SQL, there is more depth to the count() aggregate function than any of these other functions. We're going to cover three distinct uses of count(), each with some subtle nuance:

  1. count(*): "count star", a count of the number of rows in a resultset.
  2. count(some_expression): the typical use of count(), with usage similar to sum() and others.
  3. count() with distinct: counting the number of distinct values in a resultset.

All of these uses of count() have the following in common:

  • count() works with data of any type, including text, numbers, and dates.
  • count() will ignore nulls. If a column contains some NULL values, those values will be completely ignored when counting.
  • count() can take any valid expression as an input, including case statements, column references, or mathematical operations.

count(*)

count(*) ("count star") is a special case of the count() function and the one you will likely use most often. It is used to supply a simple count of rows in a resultset. For example, let's say we wanted to know how many rows in our addresses table from the state of West Virginia. We could get this count as follows:

select
    count(*) as west_virginia_addresses
from addresses
where state = 'WV'

This query tells the database to find all rows from the addresses table where the state is WV, and then count the number of rows. Note that it's not counting any particular column, it's simply saying "this is how many rows are returned by your from and where clauses."

count(*) can also be used with group by to generate row counts broken down by values from other fields in your data. For example, we can get the count of Birds fund and Trees fund transactions like so:

No Results

Exercise 1: Basic count(*)

The outreach team wants to know how many donors we have in Arizona, New Mexico, and Nevada. Write an SQL query using count() to generate this number for them.

No Results

Exercise 2: Advanced count(*)

Outreach has divided up the work of stewarding their Southwest donors by state, and needs your original count updated to be broken out by state. Write an SQL query using count() that does this.

No Results

count() with Expressions

Using count() by passing a column name or expression (as we do with sum() and other aggregate functions) is also common and useful. The primary difference between count(*) and count(some_expression) is that count star provides a count of rows where NULL values in particular columns matter less or not at all. In contrast, count() with an expression (instead of *) evaluates specific values in a column/expression where NULL values matter since any NULL value is not counted. Let's look at an example.

We want to generate a list of each donor ID in the database along with the number of Birds fund transactions they've made and the number of Trees fund transactions they have made. We can leverage the fact that count() ignores NULL values to do this:

No Results

The case statements inside each of our count() function calls check to see if each transaction is to the Birds or Trees funds. When we're counting Birds transactions, we have our case statement return the transaction_id for all Birds transactions (since it's the primary key and guaranteed not to be NULL) while returning NULL for all non-Birds transactions. Our count() function then provides us the count of all non-NULL values, giving the count of Birds gifts. The same is done for the Trees transactions.

Exercise 3: count() with Expressions

Membership wants a breakdown of each donor in the database, listing:

  • account_id
  • number of total gifts
  • number of gifts under $100
  • number of gifts of $100 or more

Write an SQL query using count() to generate this listing.

No Results

count() with distinct

So far, we've learned to how to count the number of rows in a result set using count star, and we've learned how to count non-null values returned by a column reference or expression. But in fundraising, we often need to count unique donors. We can do this using count() with distinct. As a very simple example, let's say we want to know the total number of unique donors who have ever given to Save the Birds and Trees. We can do this as follows:

select
    count(distinct account_id) as unique_donors
from transactions

Insertion of the distinct keyword works just like it does in other contexts. Instead of providing the count of all non-NULL values in our expression, it returns the count of distinct non-NULL values in our expression. We can make this more concrete by looking at the difference between various types of count() usage:

No Results

Notice a few things about these results:

  • The value for the count_star and count_of_accounts is identical. This is because count(*) is counting the number of rows in our transaction table, while count(account_id) is counting the number of rows in our transaction table with non-NULL values for account_id. Since every transaction in the database has a non-NULL account_id, these counts are the same.
  • Our unique_donor_count is much lower than the other two counts. This is because most donors to Save the Birds and Trees have made multiple gifts, but our distinct operator ensures that each unique value of account_id is only counted only once, regardless of how many times it appears in the transactions table.

Exercise 4: count() with distinct

Membership wants to know the number of unique donors who have given in each calendar year. Write a query using count() with distinct to generate these numbers. Use order by to make the results easier to read.

No Results

Exercise 5: Advanced count() with distinct

Membership wants to know, for each calendar year, the number of:

  • Unique total donors
  • Unique donors to the Birds fund
  • Unique donors to the Trees fund

Write an SQL query to generate this listing, and use order by to make the results easier to read.

No Results

Bonus: sum() to Count

Often it can be easier/clearer to "count" things using sum() instead of count(). For example, let's reconsider the situation where we wanted to generate a list of each donor ID in the database along with the number of Birds fund transactions they've made and the number of Trees fund transactions they have made. We can accomplish this task with count() as you have seen, but we can also accomplish it with sum() like this:

select
    account_id
    ,sum(case when fund = 'Birds' then 1 else 0 end) as birds_gifts
    ,sum(case when fund = 'Trees' then 1 else 0 end) as trees_gifts
from transactions
group by account_id

This query looks for transactions to the Birds/Trees funds and if they are of the correct type, the case statement returns the number 1, otherwise the case statement returns the number 0. The sum() function then adds up all these 1s and 0s to get the total number of gifts. There are times when this approach is easier or clearer, and there are times when it is not. Either solution is accurate, and with experience you'll develop a feel for when to use each method.