Tutorial: Adding an Offline TinyLM Model to Your Web App
A step-by-step walkthrough of loading a TinyLM model in the browser, caching it for offline use, and wiring it into a simple text feature.
What You Will Build
In this tutorial we will add a tiny, offline language model to a web page. By the end you will have a model that loads once, caches itself, and powers a small text feature with no server and no network calls after the first visit. The goal is to show how little work on-device AI actually requires.
Step 1: Start From the Demo
Before writing code, spend two minutes at https://ai.sprapp.com. Watch the model load, then go offline and see it keep working. This gives you a mental model of the lifecycle you are about to build: fetch once, cache, run locally.
Step 2: Choose Your Model
Pick the smallest model that fits your task. For lightweight autocomplete or classification, eeny 2.0 (999K params, ~1.76MB) is ideal. If you need a bit more range, step up to meeny 2.0 (6.2M ternary params). Smaller is better — it loads faster and uses less memory.
Step 3: Load the WASM Engine
The TinyLM inference engine ships as a WebAssembly module plus a small JavaScript loader. You initialize the engine, point it at the model file, and it handles the SIMD-accelerated inference. Conceptually:
const engine = await TinyLM.init();
const model = await engine.loadModel("eeny-2.0");
The loader fetches the model on first run and reads it from cache afterward.
Step 4: Cache in IndexedDB
You do not need to manage caching by hand — the engine stores the model in IndexedDB after the first download. The practical thing to do is show a one-time loading indicator on first visit, then skip it on later visits because the model is already local. Test by reloading with the network disabled.
Step 5: Run Inference
With the model loaded, calling it is a simple function. For a classification or completion task:
const result = await model.run(userText);
Because everything is local, the call returns quickly and the user's text never leaves the page.
Step 6: Handle the Honest Edge Cases
Tiny models are narrow, so design around their limits. Constrain the input to the task the model was trained for, validate outputs before acting on them, and provide a graceful fallback for inputs outside scope. Do not present a tiny model as an all-purpose assistant — present it as the focused feature it is.
Step 7: Specialize With Fine-Tuning
For better results on your specific data, fine-tune the model with LoRA, either on-device or in the cloud, then ship the adapted weights. A model tuned to your vocabulary will outperform the generic one on your task while staying just as small.
Wrapping Up
That is the whole loop: pick a tiny model, load the WASM engine, let it cache to IndexedDB, run inference locally, and respect the model's limits. The result is a feature that is private by physics, works offline, and costs nothing to serve. For commercial deployment, TinyLM licenses run $19 to $99.