Keyset Pagination with Cursors

SQL queries involving pagination often use `OFFSET` to progress through a series of results, and pair that with `COUNT(*)` to show how far into the page count the pagination has progressed. This is slow, and gets progressively slower the further in it gets. Particularly once sorting is involved. One solution is to use *Keyset Pagination*. ## What is Keyset Pagination? Keyset Pagination is a database pagination method that makes use of a _Cursor_ to specify where in the total set of results to fetch next. This is done by including a `WHERE` clause that limits results to a set that would appear after the previous page's results, taking advantage of indexes to narrow down the search space. The simplest example would be a list ordered by `id`. If fetching a page of 20 results with `id` 1 through 20, the next page's results would be fetched with: ```sql SELECT <columns> FROM <table> WHERE id > 20 ORDER BY id LIMIT 20; ``` If that returns results with `id` between 21 and 30, the next would be fetched using `id > 30`. This should have a predictable cost whether it's the second page of results of the 200th. ## Pros and Cons to Keyset Pagination **Pros:** * Faster pagination at stable costs. * Pagination continues from where you left off, even if data is added/deleted on a "page". **Cons:** * No ability to jump to a particular page. * No way to know what page you're in (you have to think in terms of "first/next/previous" and maybe "last" only). * If referenced column data disappears or changes, it may impact where the cursor is positioned, causing skipped rows, duplicates relative to what the user has already seen, or shifting boundaries for pages (but offset-based pagination has the same problems). ## What is a Cursor? We've modeled how this works under the hood, but the client needs to know how to communicate what the previous or next page is. For this, a _Cursor_ is used. This is a value that can represent this state. An example may be a base64-encoded JSON payload. For example: ```json { "aad": {...}, "data": { "k": { "id": 20, "last_updated": "2026-03-15 12:21:42" }, "v": 1 }, "sort_fields": [ ["last_updated", True], ["pk", False], ], "v": 1 } ``` Here we're storing: * `aad`: Any [[AAD|Additional Authenticated Data]] we want to help validate the cursor, which should include filter criteria * `data`: A data payload (containing `k` cursor key values and a `v` version field) * `sort_fields`: The list of pairs of `column, ascending_flag` sort criteria * `v`: An outer version payload. The resulting base64 value could then be used directly, or signed with [[HMAC]] to ensure there's no tampering. The backend would then verify/deserialize this and plug the values into the query. ## Keyset SQL Let's dig into how we generate SQL for keyset pagination. ### Using One or More Columns When paginating, we need to include at least one sortable column, but we can factor in multiple columns. No matter what we choose, we must always include a [[PK Tie-Breaker]] as the final sort column. This is just the unique auto-incrementing ID of the row. This ensures consistent ordering of results in the event that two rows contain all the same sortable column data. For example, if sorting with a `last_updated` column, we'd include the `id` as follows: ```sql SELECT <columns> FROM <table> WHERE last_updated < '2026-03-15 12:21:42' OR ( last_updated = '2026-03-15 12:21:42' AND id < 20 ) ORDER BY last_updated DESC, id DESC LIMIT 20; ``` This would choose the next 20 results that are older than `2026-03-15 12:21:42`, using the `id` as a [[PK Tie-Breaker]] in case values match. > [!important] Seriously, don't forget the tie-breaker! > You'll always want `id` (or whatever the field name is for the table) as a tie-breaker in order to ensure stable results. For instance, if two results had the same timestamp, sort order would be inconsistent and could lead to missing or duplicating a result when moving across page boundaries. A stable tie-breaker avoids this. We can include multiple columns, but it'll need to become a bit more complex: ```sql SELECT <columns> FROM <table> WHERE last_updated < '2026-03-15 12:21:42' OR ( last_updated = '2026-03-15 12:21:42' AND author_id > 1234 ) OR ( last_updated = '2026-03-15 12:21:42' AND author_id = 1234 AND id < 20 ) ORDER BY last_updated DESC, author_id ASC, id DESC LIMIT 20; ``` > [!important] Make sure your operators and `ASC`/`DESC` match! > The example above is factoring in `author_id` as sorting in ascended order, so we use `>` in the comparison.If it was sorting `DESC`, we'd use `<`. > [!attention] Ordering *must* be stable! > The final sorting must guarantee uniqueness of results, or you can get duplicates, skipped rows, and bad page boundaries. As the number of columns increase, the complexity of the conditions increase. It's also pretty important to make sure these are indexed columns. You may also need to handle `NULL` specially. For this, you need to decide how a `NULL` should sort in relation to other values. ### Paginating Backwards If you need to move to a previous page, you have to reverse the order of operations. The above examples would become: ```sql SELECT <columns> FROM <table> WHERE last_updated > '2026-03-15 12:21:42' OR ( last_updated = '2026-03-15 12:21:42' AND id > 20 ) ORDER BY last_updated ASC, id ASC LIMIT 20; ``` We reversed the operator and the switched from `DESC` to `ASC`. For the more complex example: ```sql SELECT <columns> FROM <table> WHERE last_updated > '2026-03-15 12:21:42' OR ( last_updated = '2026-03-15 12:21:42' AND author_id < 1234 ) OR ( last_updated = '2026-03-15 12:21:42' AND author_id = 1234 AND id > 20 ) ORDER BY last_updated ASC, author_id DESC, id ASC LIMIT 20; ``` Same deal: We flipped the operators and `ASC`/`DESC`. This will give us rows in reverse order, though, so we need to take the results we get and reverse them back. ### Navigating to the Last Page While Keyset Pagination naturally supports the first, previous, and next pages, it does not naturally support navigating to the last page. Supporting navigating to the last page of results requires: 1. Reversing the sort conditions. 2. Fetching the _first_ page of those results 3. Reversing the results in the application code. As an example, let's look at how we might get the _first_ page of results: ```sql SELECT <columns> FROM <table> ORDER BY last_updated ASC, author_id DESC, id ASC LIMIT 20; ``` To get the last, we'll reverse that: ```sql SELECT <columns> FROM <table> ORDER BY last_updated DESC, author_id ASC, id DESC LIMIT 20; ``` We simply flipped the `ASC`/`DESC` and avoided any cursor-related `WHERE` clauses. > [!NOTE] You'll need to reverse the resulting rows. > The order of rows will be in inverse order, just like when navigating to a previous page. You'll need to reverse this when processing the results. ### NULL Handling NULLs are a bit special. Different databases order NULLs differently. Here's a little truth table: | Database | NULLs Sort (Asc) | NULLs Sort (Desc) | Supports `NULLS FIRST`/`NULLS LAST` | | -------------- | ---------------- | ----------------- | ----------------------------------- | | [[MariaDB]] | ⬅️ First | ➡️ Last | ❌ No | | [[MySQL]] | ⬅️ First | ➡️ Last | ❌ No | | [[PostgreSQL]] | ➡️ Last | ⬅️ First | ✅ Yes | | [[SQLite]] | ⬅️ First | ➡️ Last | ✅ Yes | When comparing a value to `NULL`, it's important to use `IS NULL`/`IS NOT NULL` syntax. `<`, `>`, and `=` is unsafe. This means it's important that we factor this into any columns where values may be `NULL`. Unless you're talking a specific database (and maybe even if you are), it can be necessary to add explicit NULL handling so you don't risk comparison issues. Your SQL for a comparison will change depending on whether you want NULLs first or last, and whether you're sorting in ascending or descending order. These produce equivalent relative orderings: * NULLs First (Ascending); NULLs Last (Descending) * NULLs First (Descending); NULLs Last (Ascending) We'll call these: _Ordering NULLs First_ and _Ordering NULLs Last_. #### Beware of Ordering Differences Before we go deep into this, it's important to note that your NULL behavior _must_ match your `ORDER BY`'s behavior. If your database supports `NULLS FIRST` or `NULLS LAST`, you can choose one and use the appropriate logic below. Easy. However, if your database does _not_ (I'm looking at you, MySQL and MariaDB), then you need to use the version appropriate for your database, _or_ use a workaround: ```sql -- Choose NULLs First ORDER BY (col IS NULL) DESC, col ASC -- Choose NULLs Last ORDER BY (col IS NULL) ASC, col ASC ``` This will order by a `(is_null, value)`. The first expression evaluates to `0` or `1`. A couple important notes if you choose to go that route: 1. This may have an impact to your index usage and therefore query performance. 2. Each column in a multi-column sort may have its own `NULL` ordering, so you'd need to be explicit for each column that's affected. If you support multiple databases, and don't care about whether `NULL` values are ordered first or last, then you may want to consider generating your SQL based on that database's behavior — as long as you're consistent. #### Ordering NULLs First When ordering NULLs first, your data will look like: ``` NULL, NULL, 0, 1, 2, 3 ``` For any NULLable column, we need to replace the bare `<` or `>` comparison with a NULL-safe comparison. How we do this depends on the operator. ##### col > :cursor_col (NULLs First) NULLs sort before everything, so if the cursor is `NULL`, it will always come before any non-`NULL` value. ```sql -- Before: ( ... AND col > :cursor_col ) -- After: ( ( col IS NOT NULL AND :cursor_col IS NULL ) OR ( col IS NOT NULL AND :cursor_col IS NOT NULL AND col > :cursor_col ) ) ``` If you know whether `:cursor_col` is or is not `NULL`, you can simplify this: ```sql -- Cursor is NULL: col is NOT NULL -- Cursor is not NULL: ( col IS NOT NULL AND col > :cursor_col ) ``` ##### col < :cursor_col (NULLs First) If the cursor is a non-`NULL` value, both `NULL`s and smaller values will come before it. If the cursor is `NULL`, nothing will come before it. ```sql -- Before: ( ... AND col < :cursor_col ) -- After: ( ( col IS NULL AND :cursor_col IS NOT NULL ) OR ( col IS NOT NULL AND :cursor_col IS NOT NULL AND col < :cursor_col ) ) ``` If you know whether `:cursor_col` is or is not `NULL`, you can simplify this: ```sql -- Cursor is NULL: FALSE -- This will never evaluate to TRUE. -- Cursor is not NULL: ( col IS NULL OR col < :cursor_col ) ``` ##### Our Example with NULLs First Plugging into our example above (with NULLs sorted first, and `last_updated ASC, author_id DESC, id ASC`), we get: ```sql SELECT <columns> FROM <table> WHERE ( ( -- This differs from NULLs Last. last_updated IS NOT NULL AND '2026-03-15 12:21:42' IS NULL ) OR ( last_updated IS NOT NULL AND '2026-03-15 12:21:42' IS NOT NULL AND last_updated > '2026-03-15 12:21:42' ) ) OR ( last_updated = '2026-03-15 12:21:42' AND ( ( -- This differs from NULLs Last. author_id IS NULL AND 1234 IS NOT NULL ) OR ( author_id IS NOT NULL AND 1234 IS NOT NULL AND author_id < 1234 ) ) ) OR ( last_updated = '2026-03-15 12:21:42' AND author_id = 1234 AND id > 20 ) ORDER BY last_updated ASC, author_id DESC, id ASC LIMIT 20; ``` That example's for demonstrative purposes. We of course don't _need_ conditions like `1234 is NOT NULL`, but in reality we're using safe SQL placeholders for populating values, _right?!_ #### Ordering NULLs Last For a NULLs Last approach, data is ordered in ascending order like so: ``` 0, 1, 2, 3, NULL, NULL ``` For any NULLable column, we need to replace the bare `<` or `>` comparison with a NULL-safe comparison. How we do this depends on the operator. ##### col > :cursor_col (NULLs Last) When ordering NULLs last, NULLs will be sorted after all non-`NULL` values. If the cursor is `NULL`, nothing will come after it. ```sql -- Before: ( ... AND col > :cursor_col ) -- After: ( ( col IS NULL AND :cursor_col IS NOT NULL ) OR ( col IS NOT NULL AND :cursor_col IS NOT NULL -- Check/optimize this in query building AND col > :cursor_col ) ) ``` If you know whether `:cursor_col` is or is not `NULL`, you can simplify this: ```sql -- Cursor is NULL: FALSE -- This will never evaluate to TRUE. -- Cursor is not NULL: ( col IS NULL OR col > :cursor_col ) ``` ##### col < :cursor_col (NULLs Last) If the cursor is `NULL`, every non-NULL value will come before it. If it's non-NULL, only smaller values will come before it. ```sql -- Before: ( ... AND col < :cursor_col ) -- After: ( ( col IS NOT NULL AND :cursor_col IS NULL ) OR ( col IS NOT NULL AND :cursor_col IS NOT NULL AND col < :cursor_col ) ) ``` If you know whether `:cursor_col` is or is not `NULL`, you can simplify this: ```sql -- Cursor is NULL: col IS NOT NULL -- Cursor is not NULL: ( col IS NOT NULL AND col < :cursor_col ) ``` ##### Our Example with NULLs Last Plugging into our example above (with NULLs sorted last, and `last_updated ASC, author_id DESC, id ASC`), we get: ```sql SELECT <columns> FROM <table> WHERE ( ( -- This differs from NULLs First. last_updated IS NULL AND '2026-03-15 12:21:42' IS NOT NULL ) OR ( last_updated IS NOT NULL AND '2026-03-15 12:21:42' IS NOT NULL AND last_updated > '2026-03-15 12:21:42' ) ) OR ( last_updated = '2026-03-15 12:21:42' AND ( ( -- This differs from NULLs First. author_id IS NOT NULL AND 1234 IS NULL ) OR ( author_id IS NOT NULL AND 1234 IS NOT NULL AND author_id < 1234 ) ) ) OR ( last_updated = '2026-03-15 12:21:42' AND author_id = 1234 AND id > 20 ) ORDER BY last_updated ASC, author_id DESC, id ASC LIMIT 20; ``` ### To COUNT(\*) or Not to COUNT(\*) Offset-based pagination usually performs a `COUNT(*)` to determine how many results and therefore pages there are left, so users can jump around to later pages without just clicking Next a bunch. When using a _Cursor_, this isn't needed, because we're not dealing with page counts. We're dealing with relative values in queries. Removing `COUNT(*)` can therefore be an optimization. That said, it can still be necessary to communicate this information to the user or to an API. While keyset pagination does not require a `COUNT(*)`, applications can still include it, but if it's an expensive calculation then they'll still have to deal with that cost. ## Keeping Page-Based Navigation Keyset pagination is really about Previous/Next. It's great for [[Infinite Scrolling]], or patterns like that. But sometimes you still want to let users navigate to other pages. Pages get more expensive the further you go, but there may be some value in showing a number of pages early in the list. A hybrid approach can be employed. ### Offset-based pagination for the first N pages One approach is to show up to, say, 10 pages in the UI, and use offsets if jumping to those. After that, avoid page-based navigation and only allow Previous/Next. ### Cache boundaries as you navigate Assuming data is relatively stable, you can cache cursors for page boundaries as the user navigates. For example, if they've clicked Next 9 times, on that 9th page we know the cursor for page 10. That can go in cache, so the next page visit navigation to page 10 becomes cheap. This can be combined with offset-based navigation, preferring the cursor if known. **Note:** If the state the cursor is based off of changes, it may no longer point to the right place in the results. ### Precompute cursors If it's inexpensive to do so, you can fetch a larger number of partial results up-front (just enough to build the cursors) and then generate cursors from them. If your page size is `P`, and you want to precompute cursors for `N` pages, you'd need to fetch: $ P * (N - 1) $ This gives you enough information to compute each boundary's cursor. Let's take `P = 10` and `N = 4`: * Page 1 (rows 1-10) needs no cursor * Page 2 (rows 11-20) needs the last row of page 1 (`P`) * Page 3 (rows 21-30) needs the last row of page 2 (`2P`) * Page 4 (rows 31-40) needs needs row `3P` You only need to fetch up to 30 rows (`10 * (4 - 1)`) to know the cursors for pages 2-4. This, again, only works well if results are stable. Otherwise you wouldn't want to cache long, since the meaning of page 5 (for example) might change. You could just pre-compute when reaching certain page counts (e.g., page 1, page 10, page 20, etc.), combining this with [[#Cache boundaries as you navigate]]. > [!tip] The fewer pages you need to show, the cheaper this is > If you only need to know 5 pages ahead, that's not a lot of results to fetch. ### Hybrid approach: Offsets and Cursors Offset-based pagination gets more expensive as you go, but it's fairly cheap early in the results. So one approach is to allow usage of offsets for the first N pages but require cursor-based navigation and pre-computing after. For example: * Pages 1-5 might allow an Offset * Pages 6+ might require cursors * Precomputing can be used at regular page boundaries (say, every 5 pages) to precompute cursors for the next 5 pages * Large jumps (say, 20, 50, 100 pages in) are either disallowed or rate-limited ## Simulating Page Numbers Keyset pagination doesn't require throwing out page numbers. You can still represent them by simulating them based on knowledge of the cursor position. This assumes: 1. You know what page the user is currently on (tracking as you paginate) 2. You know the cursors for nearby pages (using one of the techniques above) 3. You can label those positions relative to the current page number. If you know the user is on page 4, you can show: * Page 2 * Page 3 * **Page 4** * Page 5 * Page 6 Navigating to pages 3 or 4 would use standard Cursor-based navigation. Page 6 might be a cached cursor from looking ahead.