We rebuilt the audience CDP in a weekend. Here's why we had to.
Our segments were a static list with extra steps. We tore the whole thing out and started again with a real event stream.
When we launched Mailapp's audience in 2024, we did what every email tool does: we shipped a 'lists' tab. Static membership, manual refresh, the works. It got us through the seed round and into the early users.
By the time we hit 80 customers, the seams were showing. Customers were asking us segment questions we couldn't answer in the product. 'Why is user 4218 in this segment?' 'What did they do to get added?' 'When will they fall out?' Every answer was a query into our internal database or, more often, a deployment.
We spent four months arguing about how to fix this. Patching the existing system, swapping in a stream, building a denormalised store. In the end, the answer was the boring one: a real event-based CDP, sitting at the heart of the product, with everything else (campaigns, automations, analytics) consuming from it.
This post walks through the architecture, the tradeoffs, and a few of the production incidents that taught us what not to do.
A short history of bad segmentation
Every email tool ships with the same broken model. You build a list. The list is a snapshot. You send to it. By the time you sent, the snapshot was wrong.
Vendors fudge this by giving you 'auto-updating' lists. Those are usually a query on a denormalised user table that re-evaluates every few hours. The cost: lag. The benefit: cheap. The result: a tool that's wrong every Friday afternoon.
What we wanted was a real CDP. Events streaming in, traits updating, computed properties evaluating, segments re-membering — all sub-second.
The architecture we landed on
Three layers, no compromises:
Events. A write-only append log per workspace. Every track/identify/group call lands here within milliseconds. Partitioned by user_id for read parallelism. Retained for 18 months hot, then cold-archived to S3.
Properties. Materialised views over the event log. Both 'classical' columns (last_seen, plan, country) and computed properties (events_in_last_7d, sum_revenue_l30d, distinct_devices_alltime). Every materialised view is incrementally maintained.
Segments. Boolean expressions over properties. Stored as ASTs, evaluated with a small interpreter, cached at the workspace level. Subscribers (automations, campaigns) get notified on membership changes via a per-segment topic.
We considered Redshift, Snowflake, ClickHouse, Pinot. We landed on ClickHouse for events and properties, with a custom subscription layer in Rust on top. p99 segment evaluation is 35ms across 100M contacts; p99 membership notification to downstream is 80ms.
Three incidents we won't forget
On day one of the rollout, a customer's segment grew from 12k to 4.2M in the space of an hour. We hadn't capped the membership-change topic, and the burst pinned the consumer that fans out to automations. Two automations got stuck for seven minutes. The customer didn't notice; we did.
On day three, a different customer started ingesting events with a trait shaped like `country: "DE,FR"`. Our property writer assumed it was scalar. The materialised view for that property exploded. We added a schema-on-write layer to the ingestion path the same day.
On day eleven, the subscription system thundering-herded the database when an automation re-deployed and triggered a cold-start scan of memberships. Fix: warm the segment cache on automation deploy.
We're proud of how few of these the customers saw — but every one of them taught us something we'd missed.
What we'd do differently
We over-invested in optimising single-segment latency at the expense of multi-segment scans. Most of our analytics queries hit dozens of segments at once. We're rebuilding the scan path as a result.
We under-invested in the audit trail. Customers want to know not just 'is this user in this segment now' but 'when did they join, when did they leave, what events caused it.' We're building this in 8.5.
If you're building a CDP, the meta-lesson is: it's never about the database. It's about the contract between the database and everything that consumes it. Get the contracts right early, and you can swap the storage layer once a year if you have to.