23. count()
The count() aggregate function does what it sounds like: it counts how many times non-null values appear in a given column or expression. In daily life, we tend to think of counting things as easier than summing, averaging, or determining highest/lowest values. But in SQL, there is more depth to the count() aggregate function than any of these other functions. We're going to cover three distinct uses of count(), each with some subtle nuance:
count(*): "count star", a count of the number of rows in a resultset.count(some_expression): the typical use ofcount(), with usage similar tosum()and others.count()withdistinct: counting the number ofdistinctvalues in a resultset.
All of these uses of count() have the following in common:
count()works with data of any type, including text, numbers, and dates.count()will ignore nulls. If a column contains someNULLvalues, those values will be completely ignored when counting.count()can take any valid expression as an input, including case statements, column references, or mathematical operations.
count(*)
count(*) ("count star") is a special case of the count() function and the one you will likely use most often. It is used to supply a simple count of rows in a resultset. For example, let's say we wanted to know how many rows in our addresses table from the state of West Virginia. We could get this count as follows:
select
count(*) as west_virginia_addresses
from addresses
where state = 'WV'This query tells the database to find all rows from the addresses table where the state is WV, and then count the number of rows. Note that it's not counting any particular column, it's simply saying "this is how many rows are returned by your from and where clauses."
count(*) can also be used with group by to generate row counts broken down by values from other fields in your data. For example, we can get the count of Birds fund and Trees fund transactions like so:
Exercise 1: Basic count(*)
The outreach team wants to know how many donors we have in Arizona, New Mexico, and Nevada. Write an SQL query using count() to generate this number for them.
Exercise 2: Advanced count(*)
Outreach has divided up the work of stewarding their Southwest donors by state, and needs your original count updated to be broken out by state. Write an SQL query using count() that does this.
count() with Expressions
Using count() by passing a column name or expression (as we do with sum() and other aggregate functions) is also common and useful. The primary difference between count(*) and count(some_expression) is that count star provides a count of rows where NULL values in particular columns matter less or not at all. In contrast, count() with an expression (instead of *) evaluates specific values in a column/expression where NULL values matter since any NULL value is not counted. Let's look at an example.
We want to generate a list of each donor ID in the database along with the number of Birds fund transactions they've made and the number of Trees fund transactions they have made. We can leverage the fact that count() ignores NULL values to do this:
The case statements inside each of our count() function calls check to see if each transaction is to the Birds or Trees funds. When we're counting Birds transactions, we have our case statement return the transaction_id for all Birds transactions (since it's the primary key and guaranteed not to be NULL) while returning NULL for all non-Birds transactions. Our count() function then provides us the count of all non-NULL values, giving the count of Birds gifts. The same is done for the Trees transactions.
Exercise 3: count() with Expressions
Membership wants a breakdown of each donor in the database, listing:
- account_id
- number of total gifts
- number of gifts under $100
- number of gifts of $100 or more
Write an SQL query using count() to generate this listing.
count() with distinct
So far, we've learned to how to count the number of rows in a result set using count star, and we've learned how to count non-null values returned by a column reference or expression. But in fundraising, we often need to count unique donors. We can do this using count() with distinct. As a very simple example, let's say we want to know the total number of unique donors who have ever given to Save the Birds and Trees. We can do this as follows:
select
count(distinct account_id) as unique_donors
from transactionsInsertion of the distinct keyword works just like it does in other contexts. Instead of providing the count of all non-NULL values in our expression, it returns the count of distinct non-NULL values in our expression. We can make this more concrete by looking at the difference between various types of count() usage:
Notice a few things about these results:
- The value for the
count_starandcount_of_accountsis identical. This is becausecount(*)is counting the number of rows in our transaction table, whilecount(account_id)is counting the number of rows in our transaction table with non-NULLvalues foraccount_id. Since every transaction in the database has a non-NULLaccount_id, these counts are the same. - Our
unique_donor_countis much lower than the other two counts. This is because most donors to Save the Birds and Trees have made multiple gifts, but ourdistinctoperator ensures that each unique value ofaccount_idis only counted only once, regardless of how many times it appears in the transactions table.
Exercise 4: count() with distinct
Membership wants to know the number of unique donors who have given in each calendar year. Write a query using count() with distinct to generate these numbers. Use order by to make the results easier to read.
Exercise 5: Advanced count() with distinct
Membership wants to know, for each calendar year, the number of:
- Unique total donors
- Unique donors to the Birds fund
- Unique donors to the Trees fund
Write an SQL query to generate this listing, and use order by to make the results easier to read.
Bonus: sum() to Count
Often it can be easier/clearer to "count" things using sum() instead of count(). For example, let's reconsider the situation where we wanted to generate a list of each donor ID in the database along with the number of Birds fund transactions they've made and the number of Trees fund transactions they have made. We can accomplish this task with count() as you have seen, but we can also accomplish it with sum() like this:
select
account_id
,sum(case when fund = 'Birds' then 1 else 0 end) as birds_gifts
,sum(case when fund = 'Trees' then 1 else 0 end) as trees_gifts
from transactions
group by account_idThis query looks for transactions to the Birds/Trees funds and if they are of the correct type, the case statement returns the number 1, otherwise the case statement returns the number 0. The sum() function then adds up all these 1s and 0s to get the total number of gifts. There are times when this approach is easier or clearer, and there are times when it is not. Either solution is accurate, and with experience you'll develop a feel for when to use each method.