As organizations operationalize their mail targeting, there’s a tendency to build up a few big monolithic queries that define key user segments based on a mix of different signals.
For example, you might define a pool of “core mailable users” that includes 90-day clickers, plus 60-day openers, plus 3-year donors — and as time goes on you might extend that to also include all-time high-value donors — but skip over users who are new in the last three days and so will be getting a different mail stream — and then subtract out a segment of users who’re enrolled in a longitudinal A/B test… and eventually your query is hundreds of lines of hand-coded SQL, which is confusing and difficult to maintain.
I wondered if we could find a way to compose those big monolithic queries by combining a bunch of smaller query reports. And it turns out that the answer is… we can.
Common Table Expressions (SQL CTEs)
In the context of SQL queries, CTEs allow you to preface your main SELECT statement with one or more additional SELECT statements that should be run first.
This can be used to combine the results of multiple queries:
with user_pool_1 as ( 
  select id from core_user where ...
), 
user_pool_2 as (
  select id from core_user where ...
)
select * from user_pool_1 
union
select * from user_pool_2Each of the CTEs is queried first, then the final query combines them.
Accessing Other Query Reports
In order to assemble our overall composite query using the SQL from several separate query reports, we need two additional tools: a stripped-down query template and a special SQL query.
Required Query Template: Single Raw Value

Required Query Report: SQL from Query Report

This query report looks takes a single parameter — the name of a different query report — and uses that to look up the requested report and output its SQL as a raw value.
Combining Report SQL with CTEs
Now let’s look at an example of using those two features together.
Imagine you have two existing targeting queries — “recent users from NY” and “recent users from CA” — and you want to create a new targeting query that returns the users matched by either of those queries. (This isn’t how we would actually target users from two states, but it’ll serve as a simple example.)
We could have copied and pasted the SQL from those two targeting queries into one file, joining them together into a single larger query — but instead we’re going to use the above techniques to combine them dynamically:

When you run this query, ActionKit first goes and gets the SQL from both of the named query reports, and then puts it together into one large block of SQL before sending it off to the database server.
The way the two subqueries are combined depends on what kind of logic you put in the final SELECT of your query.
Adding Two Results Together (A plus B)
The SQL UNION keyword  will let us combine two sets:
with ny_users as (
  {% report "query_sql" with "recent_users_from_ny" as name %}
), ca_users as (
  {% report "query_sql" with "recent_users_from_ca" as name %}
)
select * from ny_users
union
select * from ca_usersThis report will run the SQL from the recent_users_from_ny report, then the SQL from the recent_users_from_ca report, then combine all of the rows into one big set and return all of them.
Intersecting Results (A if also B)
Imagine that we also have an existing targeting query that finds all of the users whose email address ends in @example.com, ActionKit’s standard placeholder for unknown addresses.
We can combine this with one of our state queries from the previous example:
with ny_users as (
  {% report "query_sql" with "recent_users_from_ny" as name %}
), example_users as (
  {% report "query_sql" with "recent_users_with_example_emails" as name %}
)
select ny_users.* from ny_users
join example_users on ny_users.user_id = example_users.user_idThis will run the two named queries and then return only the rows where the same user ID appears in both — New Yorkers who have an @example.com address.
Subtracting Results (A if not B)
By changing just the last couple of lines of the query, we can invert the logic:
with ny_users as (
  {% report "query_sql" with "recent_users_from_ny" as name %}
), example_users as (
  {% report "query_sql" with "recent_users_with_example_emails" as name %}
)
select ny_users.* from ny_users
left join example_users on ny_users.user_id = example_users.user_id
where example_users.user_id is nullThis will run the two named queries and then return all of the users from New York — after excluding any users that had been matched by the example domain query.
Chaining Multiple Operations
Let’s find the users from New York, and then add the users from California, then subtract the users with example.com addresses:
with ny_users as (
  {% report "query_sql" with "recent_users_from_ny" as name %}
), ca_users as (
  {% report "query_sql" with "recent_users_from_ca" as name %}
), example_users as (
  {% report "query_sql" with "recent_users_with_example_emails" as name %}
), ny_ca_users as (
  select * from ny_users
  union
  select * from ca_users
)
select ny_ca_users.* from ny_ca_users
left join example_users on ny_ca_users.user_id = example_users.user_id
where example_users.user_id is nullThis will run all three named queries, then combine the two states results to form an overall set, then remove the rows that were found in the domain query.
Random Sampling
Let’s choose a subset of users from two different pools in defined proportions:
with ny_users as (
  {% report "query_sql" with "recent_users_from_ny" as name %}
), ca_users as (
  {% report "query_sql" with "recent_users_from_ca" as name %}
)
( select * from ny_users order by rand() limit 5 )
union
( select * from ca_users order by rand() limit 3 )This will run the two named queries, and then take five random rows from one set and three from the other.
Uses Beyond Targeting
Although targeting queries are often the most complicated, there are other specialized roles that query reports can play in ActionKit, including:
- engagement queries determine which users are considered active, and
- updater queries automatically calculate new values for custom user fields.
Depending on your requirements, the composite query technique might also be useful in those settings.
Features of this Approach
- The underlying queries can either be coded by hand or with the query builder.
- Because they’re independent queries, we can run each of the pieces separately to confirm their results look like we expect.
- The composite query doesn’t have to repeat the SQL from the independent queries, so we can focus on the way the results are combined.
- Any changes we make to an individual query automatically get picked up by the composite queries that include it.
Limitations
- This technique shown above does not yet support passing parameters between the queries.
- If you change one of the individual queries, mailings which use the composite query for targeting will not notice the change and invalidate their previous recipient list; you will need to manually force a targeting rebuild if you want the changes to take effect.
- Queries with CTEs often perform well, but there will be cases where this type of composite query will not run as fast as a hand-optimized version that put all of the logic into a single big SELECTstatement.
Conclusion
The combination of SQL Common Table Expressions and ActionKit’s support for Django template code in query reports lets us write several separate queries, and then invoke all of them from within a composite query that combines their results.

Leave a Reply