Load Once, Run Forever: How TinyLM Caches Models in IndexedDB
A look at the caching layer that lets TinyLM download a model a single time and then start instantly and offline on every later visit.
The Caching Problem
A browser-based model has to come from somewhere on first load. But re-downloading several megabytes every visit would be slow and would defeat the offline promise. The solution is to download once and persist the model locally. TinyLM uses IndexedDB, the browser's built-in client-side database, to do exactly that.
Why IndexedDB
IndexedDB is the right tool because it is designed to store larger binary blobs persistently in the browser, survives page reloads and restarts, and is available across modern browsers on both desktop and mobile. It comfortably holds a model file like eeny 2.0 (about 1.76MB) or meeny 2.0, and it keeps the data until explicitly cleared.
The First-Load Flow
On a user's first visit, the engine fetches the model file over the network, hands it to the inference runtime, and writes it into IndexedDB. You typically show a brief one-time loading indicator during this step. It happens once per device, and after that the network is no longer involved in loading the model.
The Returning-Visit Flow
On every later visit, the engine checks IndexedDB first. The model is already there, so it loads from local storage and is ready almost immediately — no download, no spinner, no network. This is what makes TinyLM feel instant on repeat use and what makes it work with no connection at all. Try it at https://ai.sprapp.com: load once, then reload offline.
Versioning and Updates
When a new model version ships, the cache needs to know. A simple versioned key lets the engine detect that the cached model is outdated and fetch the new one, then replace the old entry. This keeps users current without forcing a re-download on every visit, striking a balance between freshness and the load-once promise.
Storage Is Not Infinite
Honesty point: browser storage has limits and users can clear it. If a user wipes site data, the model is gone and will re-download on the next visit. This is rare and the re-download is small, but it is worth designing for — treat the cache as a fast path, not a guarantee, and always be able to re-fetch.
Privacy of the Cache
The cached model lives in the browser's own storage, sandboxed to the origin that stored it. It is just the model weights, not your data — your text is never written to the cache. The caching layer is purely about making the model available fast and offline; it does not change the private-by-physics property of inference.
The Payoff
IndexedDB caching is the unglamorous plumbing that makes the whole TinyLM experience work. Download once, and from then on the model starts instantly and runs offline on every visit. It is the bridge between "a model you fetched from the web" and "a model that lives on your device" — and it is what turns a web page into a self-contained, offline AI tool.