23. count()
The count()
aggregate function does what it sounds like: it counts how many times non-null values appear in a given column or expression. In daily life, we tend to think of counting things as easier than summing, averaging, or determining highest/lowest values. But in SQL, there is more depth to the count()
aggregate function than any of these other functions. We're going to cover three distinct uses of count()
, each with some subtle nuance:
count(*)
: "count star", a count of the number of rows in a resultset.count(some_expression)
: the typical use ofcount()
, with usage similar tosum()
and others.count()
withdistinct
: counting the number ofdistinct
values in a resultset.
All of these uses of count()
have the following in common:
count()
works with data of any type, including text, numbers, and dates.count()
will ignore nulls. If a column contains someNULL
values, those values will be completely ignored when counting.count()
can take any valid expression as an input, including case statements, column references, or mathematical operations.
count(*)
count(*)
("count star") is a special case of the count()
function and the one you will likely use most often. It is used to supply a simple count of rows in a resultset. For example, let's say we wanted to know how many rows in our addresses
table from the state of West Virginia. We could get this count as follows:
select
count(*) as west_virginia_addresses
from addresses
where state = 'WV'
This query tells the database to find all rows from the addresses table where the state is WV
, and then count the number of rows. Note that it's not counting any particular column, it's simply saying "this is how many rows are returned by your from
and where
clauses."
count(*)
can also be used with group by
to generate row counts broken down by values from other fields in your data. For example, we can get the count of Birds fund and Trees fund transactions like so:
Exercise 1: Basic count(*)
The outreach team wants to know how many donors we have in Arizona, New Mexico, and Nevada. Write an SQL query using count()
to generate this number for them.
Exercise 2: Advanced count(*)
Outreach has divided up the work of stewarding their Southwest donors by state, and needs your original count updated to be broken out by state. Write an SQL query using count()
that does this.
count()
with Expressions
Using count()
by passing a column name or expression (as we do with sum()
and other aggregate functions) is also common and useful. The primary difference between count(*)
and count(some_expression)
is that count star provides a count of rows where NULL
values in particular columns matter less or not at all. In contrast, count()
with an expression (instead of *
) evaluates specific values in a column/expression where NULL
values matter since any NULL
value is not counted. Let's look at an example.
We want to generate a list of each donor ID in the database along with the number of Birds fund transactions they've made and the number of Trees fund transactions they have made. We can leverage the fact that count()
ignores NULL
values to do this:
The case
statements inside each of our count()
function calls check to see if each transaction is to the Birds or Trees funds. When we're counting Birds transactions, we have our case
statement return the transaction_id
for all Birds transactions (since it's the primary key and guaranteed not to be NULL
) while returning NULL
for all non-Birds transactions. Our count()
function then provides us the count of all non-NULL
values, giving the count of Birds gifts. The same is done for the Trees transactions.
Exercise 3: count()
with Expressions
Membership wants a breakdown of each donor in the database, listing:
- account_id
- number of total gifts
- number of gifts under $100
- number of gifts of $100 or more
Write an SQL query using count()
to generate this listing.
count()
with distinct
So far, we've learned to how to count the number of rows in a result set using count star, and we've learned how to count non-null values returned by a column reference or expression. But in fundraising, we often need to count unique donors. We can do this using count()
with distinct
. As a very simple example, let's say we want to know the total number of unique donors who have ever given to Save the Birds and Trees. We can do this as follows:
select
count(distinct account_id) as unique_donors
from transactions
Insertion of the distinct
keyword works just like it does in other contexts. Instead of providing the count of all non-NULL
values in our expression, it returns the count of distinct non-NULL
values in our expression. We can make this more concrete by looking at the difference between various types of count()
usage:
Notice a few things about these results:
- The value for the
count_star
andcount_of_accounts
is identical. This is becausecount(*)
is counting the number of rows in our transaction table, whilecount(account_id)
is counting the number of rows in our transaction table with non-NULL
values foraccount_id
. Since every transaction in the database has a non-NULL
account_id
, these counts are the same. - Our
unique_donor_count
is much lower than the other two counts. This is because most donors to Save the Birds and Trees have made multiple gifts, but ourdistinct
operator ensures that each unique value ofaccount_id
is only counted only once, regardless of how many times it appears in the transactions table.
Exercise 4: count()
with distinct
Membership wants to know the number of unique donors who have given in each calendar year. Write a query using count()
with distinct
to generate these numbers. Use order by
to make the results easier to read.
Exercise 5: Advanced count()
with distinct
Membership wants to know, for each calendar year, the number of:
- Unique total donors
- Unique donors to the Birds fund
- Unique donors to the Trees fund
Write an SQL query to generate this listing, and use order by
to make the results easier to read.
Bonus: sum()
to Count
Often it can be easier/clearer to "count" things using sum()
instead of count()
. For example, let's reconsider the situation where we wanted to generate a list of each donor ID in the database along with the number of Birds fund transactions they've made and the number of Trees fund transactions they have made. We can accomplish this task with count()
as you have seen, but we can also accomplish it with sum()
like this:
select
account_id
,sum(case when fund = 'Birds' then 1 else 0 end) as birds_gifts
,sum(case when fund = 'Trees' then 1 else 0 end) as trees_gifts
from transactions
group by account_id
This query looks for transactions to the Birds/Trees funds and if they are of the correct type, the case statement returns the number 1, otherwise the case statement returns the number 0. The sum()
function then adds up all these 1s and 0s to get the total number of gifts. There are times when this approach is easier or clearer, and there are times when it is not. Either solution is accurate, and with experience you'll develop a feel for when to use each method.