From 94 Bytes to 36: Compressing a JSON Event with smoltext
A line-by-line walkthrough of how the semantic JSON codec turns a typical analytics event into roughly a third of its size.
A Realistic Event
Here is an analytics event you might emit on every page view:
{"user_id":48213,"event":"page_view","ts":1718553600,"path":"/pricing","ref":"google"}
Serialized, that is about 94 bytes. Multiply by a few hundred million events a month and the byte count becomes a real line item.
Step One: Structure Awareness
A general compressor sees 94 opaque bytes. The smoltext semantic JSON codec sees structure: an object with five keys, three integer values, two short string values. It encodes the shape separately from the data.
Keys like user_id, event, ts, path, and ref are extremely common across events. The codec replaces each with a short code drawn from a trained codebook, so "user_id": costs a byte or two instead of ten.
Step Two: Value Typing
Integers are stored as varints rather than ASCII digits. The timestamp 1718553600 is ten ASCII bytes but fits in four bytes as a varint. user_id shrinks similarly. This typing alone recovers a meaningful chunk before any entropy coding runs.
Step Three: Dictionary Deflate
Short string values like page_view and google appear constantly. Dictionary deflate references them from a shared dictionary, so a repeated value costs a back-reference rather than its full length.
The Result
After all three layers, the 94-byte event lands around 36 bytes — roughly 60% smaller. Crucially, no per-message table is stored, because the dictionary and codebook live server-side at the edge.
Why It Holds Up at Scale
The savings are not a one-off fluke for this single event. Because events of the same type share keys and value patterns, the codebook keeps paying off across every message. The more uniform your event schema, the better smoltext performs.
What Does Not Compress Well
Be realistic. High-entropy fields — UUIDs, random tokens, already-encoded base64 blobs — barely shrink, because there is no redundancy to exploit. If most of your payload is random bytes, no compressor will help. smoltext wins on the structured, repetitive parts.
Integrating It
The call is a single POST:
POST https://api.smoltext.sprapp.com/v1/compress
Send the JSON, store the returned compressed bytes, and decompress on read. For event pipelines, compress at ingestion and keep the data small all the way through storage.
Measure Your Own Data
Your numbers will differ from this example. Run a sample of real events through the API and measure the ratio before committing. smoltext shines when your events are small and structurally similar — which is exactly what analytics and telemetry pipelines produce.