40. lead() and lag()

The lead() and lag() window functions let you access values from other rows in your resutset without using join. Specifically:

  • lead() returns the value from a following row in the same partition
  • lag() returns the value from a previous row in the same partition

In fundraising, these functions are most helpful when trying to compare information about transactions to transactions that were made earlier by the same donor or later by the same donor. As an example, say we want to build a list of all one-time transactions in our database and we want to include in the output:

  • Donor ID
  • Transaction Date
  • Transaction Amount
  • Amount of the donor's immediately preceding one-time gift

If we think about solving this problem for a given donor, we can first take a quick look at all of their one-time gifts (I'm going to use ID 698975 as an example here):

No Results

This donor has made four gifts lifetime. For each of these gifts, we want to add to our output a column that tells us the amount of the most recent gift the donor had made as of the date of the gift in the current row. So looking at this data, we want to add a column that includes the MRC amount like in the below:

No Results

Notice that to solve our problem, we need the database to first sort the rows by transaction_date and then return the amount from the previous row. The value for MRC Amount in this table is equal to the value from the previous row's amount (MRC Amount for the first row is NULL because there are no previous rows). This is where lead() and lag() come in: they lets us put data from other rows into the current row. Here's how we would use lag() generate the desired output for our sample donor ID. I'll explain the code in detail below.

No Results

Let's break the lag() statement down piece by piece:

  • lag(amount) tells the database that we want the value for amount from the row before the current row (lead() would return the row after the current row).
  • over() works just like it does in other window functions, it provides us a way to define our partition and ordering scheme.
  • partition by account_id tells the database that we want to form groups by account_id.
  • order by transaction_date tells the database that within each group, rows should be sorted by transaction_date. This defines what we mean by before or after (lag() or lead()) - the row before this one means in this case, "the row that is one row earlier when sorted by transaction_date."

Putting this all together, our lag() statement here tells the database, "Put rows that share an account_id into groups. For each row in each group, sort it by transaction_date ascending. Once that's done, for each row in my results, find the amount from the prior row in the group, and bring it into this row. Label the resulting column mrc_date."

This is complicated, I know. Working through some examples and gaining practical experience will help you understand these functions more intuitively.

Exercise 1: lag()

Development is trying to understand cross-channel giving behavior. As part of their work, they've requested a list of every gift in the database that includes:

  • Donor ID
  • Transaction Date
  • Amount
  • Transaction Channel
  • Channel of donor's immediately preceding gift

Write an SQL query using lag() to generate this list.

No Results

lead()

lead() is exactly like lag() except instead of bringing data from a prior row into the current row, it brings data from a following row into the current row. The structure and inner workings are otherwise identical. In my experience, lag() is used significantly more often than lead(). lead() tends to be useful when performing analysis of what donors did next after having done something. That's the use case we have in the exercise below.

Exercise 2: lead()

The digital team is performing an analysis of upgrades and downgrades for donors making multiple gifts via the website channel. They would like a list of all gifts in the database from the Website channel that includes the following:

  • Donor ID
  • Transaction Date
  • Amount
  • Fund
  • The amount of the donor's next Website gift, if any

Write an SQL query using lead() to generate this list.

No Results

Offset

lead() and lag() also take an optional parameter, the offset. When there is no offset specified (as in all the examples we have reviewed so far), the database uses a default offset of 1. The offset tells the database how many rows backward/forward to retrieve data from. In fundraising, we are usually concerned with the immediately preceding or immediately following row, as we have seen in the examples above. But sometimes, we may want to get data from 2 rows back, or 3 rows back, or 10 rows forward. We can accomplish this by specifying an offset, like this:

lag(amount,3) over(partition by account_id order by transaction_date)

The offset is supplied as second argument to the lag() and lead() function, in this case the value 3. This example would tell the database to get the amount value from three rows back in the partition instead of one row back. If we go back to our example from the start of this chapter where we focused on one specific donor's one-time giving, we can see the difference between using various offsets:

No Results

The two_gifts_ago and three_gifts_ago columns show the amount value for the 2nd and 3rd most recent gifts for each row. In practice, you will rarely need this functionality. But it's important to know that it's possible.

Exercise 3: Offset

The sustainer team is troubleshooting a payment processing error. They have discovered that some donors are being charged the incorrect amount and that the trouble started 3 months ago. In order to look into this more deeply, they have requested a list of all sustainer program transactions where the donor's sustainer payment amount 2 payments ago was of a lower amount. Output should include:

  • Donor ID
  • Transaction Date
  • Amount
  • Amount of gift 2 payments ago

The output should only include rows where the amount of the gift 2 payments ago was smaller than the transaction amount and where the amount of the gift 2 payments ago was not NULL or zero. Write an SQL query using lag() with offset to generate this list.

No Results