How We Query 10 Billion Events on Snowflake in Under 2 Seconds

Setup: what "10B events" actually means here

The numbers in this post come from a representative customer — a B2C product with:

~10B events spanning ~30 months of history
~12M monthly active users
Snowflake account on AWS, dedicated XSMALL warehouse for Klaritics with auto-suspend at 60s
Event table with event_time (TIMESTAMP_NTZ), user_id (VARCHAR), event_name (VARCHAR), properties (VARIANT)
Clustering on event_time (the most important detail in this entire post)

The benchmarks below were measured on real customer dashboards over a 30-day window, p50 and p95 captured by the Klaritics query layer's own instrumentation.

Query type	p50	p95	Rows scanned (typical)
30-day DAU/WAU/MAU	0.8s	2.1s	~400M
7-day signup→activation funnel	1.4s	3.2s	~80M
30-day retention curve, 8 cohorts	1.7s	4.1s	~600M
Top 20 events by volume, 7 days	0.6s	1.5s	~250M

None of these queries touch all 10B rows. That's the entire point. Let's get into how.

1. Micro-partition pruning: the lever that does most of the work

Snowflake stores data in 50–500MB micro-partitions, each with rich metadata about the column values it contains. When a query has predicates the planner can use, it skips entire micro-partitions without reading them.

Pruning works best when:

The data is physically sorted by the partition-pruning column (or close to it)
The query has predicates on that column
The predicates are inequality bounds (>=, <, BETWEEN), not just equality

For event tables, the pruning column is almost always event_time. Events arrive roughly in time order, so partitions naturally cluster on time. With explicit CLUSTER BY (event_time) on the table, Snowflake maintains this ordering even as data drifts.

Klaritics' query layer treats time-bounded predicates as the highest-priority part of every generated query. A funnel query for "the last 7 days" isn't just a WHERE event_time >= DATEADD('day', -7, CURRENT_TIMESTAMP) — it's that predicate pushed down into every CTE, every subquery, every JOIN.

A bad funnel query (Klaritics doesn't write this):

WITH all_signups AS (
  SELECT user_id, event_time AS signup_time
  FROM events WHERE event_name = 'signup'
),
all_activations AS (
  SELECT user_id, event_time AS activated_time
  FROM events WHERE event_name = 'activated'
)
SELECT COUNT(DISTINCT s.user_id), COUNT(DISTINCT a.user_id)
FROM all_signups s
LEFT JOIN all_activations a
  ON s.user_id = a.user_id
  AND a.activated_time BETWEEN s.signup_time AND DATEADD('day', 7, s.signup_time)
WHERE s.signup_time >= DATEADD('day', -30, CURRENT_TIMESTAMP);

The same query, written for pruning:

WITH signups AS (
  SELECT user_id, event_time AS signup_time
  FROM events
  WHERE event_name = 'signup'
    AND event_time >= DATEADD('day', -30, CURRENT_TIMESTAMP)
),
activations AS (
  SELECT user_id, event_time AS activated_time
  FROM events
  WHERE event_name = 'activated'
    AND event_time >= DATEADD('day', -30, CURRENT_TIMESTAMP)
    AND event_time <= DATEADD('day', -23, CURRENT_TIMESTAMP)
)
SELECT
  COUNT(DISTINCT s.user_id) AS signed_up,
  COUNT(DISTINCT a.user_id) AS activated
FROM signups s
LEFT JOIN activations a
  ON s.user_id = a.user_id
  AND a.activated_time BETWEEN s.signup_time AND DATEADD('day', 7, s.signup_time);

The "redundant" bound on activations.event_time is intentional — it gives the planner a hard upper bound it can use for pruning. We're trading a tiny bit of query length for a 5–20× reduction in rows scanned.

2. Choose the right SQL pattern per warehouse

Funnels are computationally interesting because there are at least three valid SQL shapes for them:

Shape A — Self-join chain. Easiest to read, common in tutorials, scales badly.
Shape B — Window-function flatten. Use LEAD() over an ordered partition to get each user's "next" event. Faster than self-joins when the funnel is short.
Shape C — Conditional aggregation per user. Build a per-user bitmap of which steps they hit, aggregate. Best for fixed-step funnels.

The right shape depends on the warehouse and the funnel definition. ClickHouse has windowFunnel() — purpose-built. Snowflake doesn't, but it has a fast LEAD/LAG implementation and excellent micro-partition pruning, so Shape B with aggressive predicate pushdown wins for most cases.

3. Aggressive caching at the query layer, never at the data layer

Klaritics caches in two places:

Query result cache. When the same query (post-normalization) runs twice within a TTL, the second call returns from cache without hitting Snowflake.
Aggregate cache. Common dashboard widgets are pre-computed on a schedule and served from cache, with a fallback to live query when stale.

What we do not cache:

Raw event rows. Ever.
User-level data outside your warehouse.

4. Right-size the warehouse for the workload

Snowflake's virtual warehouses are independent compute clusters that don't share resources. The cheapest way to make Klaritics fast is to give it its own warehouse and right-size it.

For exploratory analytics workloads, XSMALL with AUTO_SUSPEND = 60 and AUTO_RESUME = TRUE is the right starting point. For dashboard-heavy workloads, bump to SMALL or MEDIUM, or enable multi-cluster scaling during peaks.

What we're not telling you

1. The first query of the day is slower. Cold compilations and cold caches still apply.

2. Custom SQL escape hatches lose guarantees. Free-form SQL can bypass pruning best practices.

3. Schema choices matter a lot. A canonical long event table outperforms a wide pivoted table for these workloads.

The 30-second summary

To query 10B events on Snowflake in under 2 seconds:

Cluster event tables on event_time.
Push time predicates into every CTE, subquery, and JOIN.
Choose SQL shape based on warehouse and funnel definition.
Cache aggregated results, never raw data.
Give analytics its own virtual warehouse.

Try it

If you have a Snowflake account and an event table, you can have Klaritics querying it in under 30 minutes. Deploy Klaritics →

About the author

The Klaritics engineering team writes about query planning, performance, and architecture in warehouse-native analytics systems.

Build your analytics stack your way

If you have a Snowflake account and an event table, you can have Klaritics querying it in under 30 minutes.

Talk to engineering Contact us

How we query 10B events on Snowflake in under 2 seconds

TL;DR

Setup: what "10B events" actually means here

1. Micro-partition pruning: the lever that does most of the work

2. Choose the right SQL pattern per warehouse

3. Aggressive caching at the query layer, never at the data layer

4. Right-size the warehouse for the workload

What we're not telling you

The 30-second summary

Try it

About the author

Related posts

Identity resolution in a warehouse-native world

The hidden cost of reverse ETL

Mixpanel vs. Amplitude vs. Klaritics: a fair comparison

Build your analytics stack your way

Stop moving data. Start analyzing it.