Skip to content
Skip to content
Klariticslaritics

Home / Blog / How we query 10B events on Snowflake in under 2 seconds

Engineering

How we query 10B events on Snowflake in under 2 seconds

Performance patterns for interactive product analytics at warehouse scale.

Klaritics Engineering

Klaritics Engineering · 2026-04-16 · 13 min

A flat-design engineering diagram of a faceted warehouse with an orange query line entering and a fast result line exiting.

TL;DR

Querying 10 billion events on Snowflake fast isn't about a magic feature. It's about respecting Snowflake's architecture: separated storage and compute, automatically managed micro-partitions, and a query planner that prunes aggressively when you give it the right SQL shape.

In this post we walk through the four things Klaritics' query layer does to make funnel and retention queries return in under 2 seconds against 10B-row event tables — and what we do not do (no caching of raw data, no shadow copies).

Setup: what "10B events" actually means here

The numbers in this post come from a representative customer — a B2C product with:

  • ~10B events spanning ~30 months of history
  • ~12M monthly active users
  • Snowflake account on AWS, dedicated XSMALL warehouse for Klaritics with auto-suspend at 60s
  • Event table with event_time (TIMESTAMP_NTZ), user_id (VARCHAR), event_name (VARCHAR), properties (VARIANT)
  • Clustering on event_time (the most important detail in this entire post)

The benchmarks below were measured on real customer dashboards over a 30-day window, p50 and p95 captured by the Klaritics query layer's own instrumentation.

Query typep50p95Rows scanned (typical)
30-day DAU/WAU/MAU0.8s2.1s~400M
7-day signup→activation funnel1.4s3.2s~80M
30-day retention curve, 8 cohorts1.7s4.1s~600M
Top 20 events by volume, 7 days0.6s1.5s~250M

None of these queries touch all 10B rows. That's the entire point. Let's get into how.

1. Micro-partition pruning: the lever that does most of the work

Snowflake stores data in 50–500MB micro-partitions, each with rich metadata about the column values it contains. When a query has predicates the planner can use, it skips entire micro-partitions without reading them.

Pruning works best when:

  1. The data is physically sorted by the partition-pruning column (or close to it)
  2. The query has predicates on that column
  3. The predicates are inequality bounds (>=, <, BETWEEN), not just equality

For event tables, the pruning column is almost always event_time. Events arrive roughly in time order, so partitions naturally cluster on time. With explicit CLUSTER BY (event_time) on the table, Snowflake maintains this ordering even as data drifts.

Klaritics' query layer treats time-bounded predicates as the highest-priority part of every generated query. A funnel query for "the last 7 days" isn't just a WHERE event_time >= DATEADD('day', -7, CURRENT_TIMESTAMP) — it's that predicate pushed down into every CTE, every subquery, every JOIN.

A bad funnel query (Klaritics doesn't write this):

WITH all_signups AS (
  SELECT user_id, event_time AS signup_time
  FROM events WHERE event_name = 'signup'
),
all_activations AS (
  SELECT user_id, event_time AS activated_time
  FROM events WHERE event_name = 'activated'
)
SELECT COUNT(DISTINCT s.user_id), COUNT(DISTINCT a.user_id)
FROM all_signups s
LEFT JOIN all_activations a
  ON s.user_id = a.user_id
  AND a.activated_time BETWEEN s.signup_time AND DATEADD('day', 7, s.signup_time)
WHERE s.signup_time >= DATEADD('day', -30, CURRENT_TIMESTAMP);

The same query, written for pruning:

WITH signups AS (
  SELECT user_id, event_time AS signup_time
  FROM events
  WHERE event_name = 'signup'
    AND event_time >= DATEADD('day', -30, CURRENT_TIMESTAMP)
),
activations AS (
  SELECT user_id, event_time AS activated_time
  FROM events
  WHERE event_name = 'activated'
    AND event_time >= DATEADD('day', -30, CURRENT_TIMESTAMP)
    AND event_time <= DATEADD('day', -23, CURRENT_TIMESTAMP)
)
SELECT
  COUNT(DISTINCT s.user_id) AS signed_up,
  COUNT(DISTINCT a.user_id) AS activated
FROM signups s
LEFT JOIN activations a
  ON s.user_id = a.user_id
  AND a.activated_time BETWEEN s.signup_time AND DATEADD('day', 7, s.signup_time);

The "redundant" bound on activations.event_time is intentional — it gives the planner a hard upper bound it can use for pruning. We're trading a tiny bit of query length for a 5–20× reduction in rows scanned.

2. Choose the right SQL pattern per warehouse

Funnels are computationally interesting because there are at least three valid SQL shapes for them:

  • Shape A — Self-join chain. Easiest to read, common in tutorials, scales badly.
  • Shape B — Window-function flatten. Use LEAD() over an ordered partition to get each user's "next" event. Faster than self-joins when the funnel is short.
  • Shape C — Conditional aggregation per user. Build a per-user bitmap of which steps they hit, aggregate. Best for fixed-step funnels.

The right shape depends on the warehouse and the funnel definition. ClickHouse has windowFunnel() — purpose-built. Snowflake doesn't, but it has a fast LEAD/LAG implementation and excellent micro-partition pruning, so Shape B with aggressive predicate pushdown wins for most cases.

3. Aggressive caching at the query layer, never at the data layer

Klaritics caches in two places:

  • Query result cache. When the same query (post-normalization) runs twice within a TTL, the second call returns from cache without hitting Snowflake.
  • Aggregate cache. Common dashboard widgets are pre-computed on a schedule and served from cache, with a fallback to live query when stale.

What we do not cache:

  • Raw event rows. Ever.
  • User-level data outside your warehouse.

4. Right-size the warehouse for the workload

Snowflake's virtual warehouses are independent compute clusters that don't share resources. The cheapest way to make Klaritics fast is to give it its own warehouse and right-size it.

For exploratory analytics workloads, XSMALL with AUTO_SUSPEND = 60 and AUTO_RESUME = TRUE is the right starting point. For dashboard-heavy workloads, bump to SMALL or MEDIUM, or enable multi-cluster scaling during peaks.

What we're not telling you

1. The first query of the day is slower. Cold compilations and cold caches still apply.

2. Custom SQL escape hatches lose guarantees. Free-form SQL can bypass pruning best practices.

3. Schema choices matter a lot. A canonical long event table outperforms a wide pivoted table for these workloads.

The 30-second summary

To query 10B events on Snowflake in under 2 seconds:

  1. Cluster event tables on event_time.
  2. Push time predicates into every CTE, subquery, and JOIN.
  3. Choose SQL shape based on warehouse and funnel definition.
  4. Cache aggregated results, never raw data.
  5. Give analytics its own virtual warehouse.

Try it

If you have a Snowflake account and an event table, you can have Klaritics querying it in under 30 minutes. Deploy Klaritics →

About the author

The Klaritics engineering team writes about query planning, performance, and architecture in warehouse-native analytics systems.

Related posts

Identity resolution in a warehouse-native world

How to model anonymous-to-authenticated identity transitions without introducing metric drift.

Read article →

The hidden cost of reverse ETL

Where reverse ETL adds leverage, and where it quietly increases analytical drag.

Read article →

Mixpanel vs. Amplitude vs. Klaritics: a fair comparison

An honest 2026 comparison of Mixpanel, Amplitude, and Klaritics. Pricing, architecture, governance, scale.

Read article →

Build your analytics stack your way

If you have a Snowflake account and an event table, you can have Klaritics querying it in under 30 minutes.

Stop moving data. Start analyzing it.

Connect your warehouse in 8 minutes. See your first funnel in under an hour.