Lossless Guarantees: Why smoltext Never Approximates Your Data
Compression for records and logs must be byte-exact. Here is how smoltext stays fully lossless across every transformation layer.
Lossless Is Non-Negotiable
For structured data — records, logs, queue messages — there is no acceptable approximation. A timestamp that comes back off by one, or a user ID with a flipped digit, corrupts everything downstream. smoltext is fully lossless: decompression always reproduces the exact input bytes.
Reversibility at Every Layer
smoltext's pipeline has three transformation layers, and each is individually reversible:
- Semantic JSON codec: key tokenization, value typing, and punctuation reconstruction all decode deterministically back to the original serialization.
- Dictionary deflate: DEFLATE is inherently lossless; the shared dictionary only changes which back-references are available, not correctness.
- Trained codebook: every code maps to exactly one substring, so decoding is unambiguous.
Because the whole chain is reversible, the output of https://api.smoltext.sprapp.com/v1/decompress is byte-identical to the input you compressed.
Preserving Serialization Quirks
A subtle point: there are many valid ways to serialize the same JSON value. smoltext preserves the exact input serialization rather than normalizing it, so whitespace, key order, and number formatting come back unchanged. This matters when downstream systems hash or sign the raw bytes.
Round-Trip Verification
A good practice during rollout is to verify the round trip on a sample:
assert decompress(compress(x)) == x for every x in sample
Run this over a representative slice of your real data before you trust compression in production. It is cheap insurance and confirms the lossless guarantee on your specific payloads.
Handling Non-JSON Input
When you send non-JSON or malformed input, smoltext does not try to be clever and risk corruption. It falls back to treating the data as an opaque lossless string. You get smaller savings, but never wrong bytes.
Why Not Lossy?
Some compression domains — images, audio — accept lossy schemes because human perception tolerates approximation. Structured records have no such tolerance. There is no perceptual slack in a JSON event. So smoltext does not offer a lossy mode for this data; correctness is the product.
Verifying After Decompression
For the highest-assurance pipelines, store a short checksum alongside the compressed bytes and verify it after decompression. Combined with smoltext's deterministic encoding, this gives end-to-end integrity from write to read.
The Bottom Line
smoltext trades nothing for its savings. The bytes you put in are the bytes you get out. The compression comes entirely from removing redundancy and structural overhead — never from discarding information.