8. Distinct
The SQL distinct
operator is used to remove duplicate rows from the result of a select
statement. It ensures that only unique values are returned. One way this is helpful is in generating lists of donors from simple queries. For example, let's say we want a list of all donors who gave at least one gift in 2024. We could query our transactions table, like this:
Hm, something about our query isn't quite right. Notice for example that account_id
633 is repeated multiple times, as are many other IDs. This happens because our query is returning one row for for each transaction because we have selected from our transactions table. Donor 633 made multiple gifts in 2024, so they show up multiple times. We can fix this with the distinct
operator. Usage of distinct
is simple: just put it right after the word select
to form select distinct
:
This tells the database, "Find me the account_id that belongs to each transaction and return a list sorted by account_id. Then take that list and remove any duplicate values, leaving only unique values." As you can see, donor 633 is now only listed once, as are all other donors. Note that the order by
clause is not required to use distinct
, I have only used it here to make it easier to compare the outputs of these two queries.
Exercise 1: Select distinct
Your community outreach team is going to be doing some work in the state of Wyoming and needs a list of each city in Wyoming where the organization has at least one donor. Use select distinct
to generate a duplicate-free list of cities in Wyoming from our addresses table.
Distinct with multiple columns
In our first example, we used distinct
in a query where we only selected a single column, account_id
. distinct
also works when selecting multiple columns. When used this way, the database will return every distinct combination of values from your selected columns. Let's look at an example.
This query returns every unique combination of first_name
and last_name
in the accounts table. Notice, however, that the name Aaron is repeated. This happens because the distinct
operator works over the combination of all columns in our select
statement. So while there are many Aarons in our output, there is only one Aaron Barker, only one Aaron Boyd, etc.
Exercise 2: Distinct with multiple columns
The operations department is looking to change credit card processing companies. In order to get a sense of which channels are using which payment methods, they need a list of every payment method and channel combination from the transactions table. Write a query using select distinct
to generate this list.