Why Gzip and Zstd Fail on Small Payloads (And What to Do Instead)
General-purpose compressors carry fixed overhead that wipes out savings on strings under 1KB. Here is why smoltext exists.
The Small-Payload Problem
Every general-purpose compressor was designed for files: documents, logs, archives, tarballs. On a 10MB payload, gzip and zstd are excellent. On a 94-byte JSON event, they often make the data larger.
The reason is fixed overhead. A gzip stream carries a 10-byte header plus an 8-byte trailer (CRC32 + length). That is 18 bytes before a single byte of your data is encoded. zstd's frame format is leaner but still spends bytes on magic numbers, frame descriptors, and window metadata.
The Math
Consider a typical event:
{"u":12841,"e":"click","t":1718000000,"p":"/checkout"}
That is roughly 54 bytes. Run it through gzip and you may land at 60-70 bytes after framing — a loss. The entropy coder never gets enough repeated data to amortize the header cost.
Why DEFLATE Needs Volume
DEFLATE (the algorithm behind gzip and zlib) builds Huffman tables from observed symbol frequencies. On short input, there is not enough signal to build a useful table, so it either falls back to a static table or stores the block raw. Either way, you pay structure cost for compression you never receive.
What smoltext Does Differently
smoltext is built specifically for the sub-1KB range. It uses three layers that avoid per-message overhead:
- A semantic JSON codec that tokenizes structure (keys, punctuation, value types) instead of treating bytes as an opaque stream.
- Dictionary deflate seeded with a shared, pre-trained dictionary, so common keys and patterns cost almost nothing.
- A trained codebook that maps frequent substrings to short codes without storing a table inside each message.
Because the dictionary and codebook live on the server, individual messages carry no table overhead. That 54-byte event compresses to roughly 22-26 bytes — a real, repeatable saving.
The Honest Tradeoff
smoltext is not a gzip replacement. On large payloads — anything above a few kilobytes — general-purpose compressors win, because their adaptive modeling pays for itself and their fixed overhead becomes negligible. We say this plainly: if your payloads are big, use zstd.
smoltext targets the band where general compressors lose money: short strings, JSON events, log lines, KV records, prompt fragments.
When the Band Matters
The small-payload band shows up everywhere in modern systems: edge KV stores, message queue records, per-request logs, analytics events, feature flags, session tokens. Each item is tiny, but you have billions of them, and storage plus egress is billed per byte.
Try It
Send a payload to the API and compare:
curl -X POST https://api.smoltext.sprapp.com/v1/compress \
-H "Content-Type: application/json" \
-d '{"data":"{\"u\":12841,\"e\":\"click\"}"}'
If your data lives below 1KB, smoltext is built for exactly that.