Why This Post Exists
Event sourcing is well-covered in conference talks and blog posts, usually in the context of introducing the pattern. What's less documented is what happens after you've been running an event-sourced system in production for a year or two. This post covers the operational lessons we've learned.
The Good Parts (Briefly)
Event sourcing gives you:
- Complete audit trail for free — every state change is a recorded event
- Temporal queries — reconstruct the state of any entity at any point in time
- Decoupled read models — build multiple projections from the same event stream
- Debugging superpower — replay events to reproduce any bug
These benefits are real and significant. They're also why we continue to use the pattern despite the operational costs described below.
Schema Evolution
Events are immutable. Once an event is persisted, its structure cannot change. But your domain model will evolve. This creates a tension that must be managed explicitly.
Approach: Event Upcasting
We use upcasters — functions that transform old event versions into the current version at read time:
// Event versions
interface OrderPlacedV1 {
type: "OrderPlaced";
version: 1;
orderId: string;
items: string[]; // Just item IDs in v1
}
interface OrderPlacedV2 {
type: "OrderPlaced";
version: 2;
orderId: string;
items: OrderLineItem[]; // Structured items in v2
currency: string; // Added in v2
}
// Upcaster
function upcastOrderPlaced(event: OrderPlacedV1): OrderPlacedV2 {
return {
type: "OrderPlaced",
version: 2,
orderId: event.orderId,
items: event.items.map(id => ({
itemId: id,
quantity: 1,
price: 0 // Unknown — must be resolved from catalog
})),
currency: "USD", // Default for pre-v2 orders
};
}
The rule: upcasters must be pure functions with no external dependencies. If an upcaster needs to look up data, your design has a problem.
Snapshot Strategy
Rebuilding an entity from thousands of events is slow. Snapshots solve this, but introduce their own complexity.
What We Learned
- Snapshot every N events, not on every write: We snapshot every 100 events. More frequent snapshots waste storage; less frequent ones increase rebuild time unacceptably.
- Snapshots are disposable: They're a performance optimization, not a source of truth. You must be able to delete all snapshots and rebuild from events.
- Version your snapshots: When your entity model changes, old snapshots become invalid. Treat snapshot deserialization failure as "rebuild from events," not as an error.
Projection Rebuilds
Read models built from event projections will occasionally need to be rebuilt — because of bugs, schema changes, or new query requirements.
Operational Requirements
- Projections must be rebuildable from zero: If you can't rebuild a projection from the event stream, it's not really a projection — it's become a primary data store.
- Rebuild must be non-disruptive: Run the rebuild alongside the live projection, swap when complete.
- Track projection position: Each projection maintains a checkpoint (last processed event position) so it can resume after interruption.
Event Stream: [e1] [e2] [e3] [e4] [e5] [e6] ...
▲
Live Projection: ────────────────┘ (position: 5)
Rebuild Projection: ──────▶ (position: 3, catching up)
When Not to Use Event Sourcing
After working with this pattern extensively, here's when we'd recommend against it:
- Simple CRUD applications: The overhead isn't justified
- Systems where "current state" is the only query: If you never ask "what happened?", events add complexity without benefit
- Teams without operational experience: Event-sourced systems require specific operational practices — if your team can't invest in learning them, use a simpler architecture
Summary
Event sourcing is a powerful pattern with real operational costs. The key is treating it as an infrastructure investment: build the tooling (upcasters, snapshot management, projection rebuilds) before you need it, and maintain it as carefully as you maintain the domain logic it supports.