Tutorial2025-10-107 min read

Tutorial: Adding an Offline TinyLM Model to Your Web App

A step-by-step walkthrough of loading a TinyLM model in the browser, caching it for offline use, and wiring it into a simple text feature.

TinyLM tutorialbrowser AIon-device AIIndexedDBweb app

What You Will Build

In this tutorial we will add a tiny, offline language model to a web page. By the end you will have a model that loads once, caches itself, and powers a small text feature with no server and no network calls after the first visit. The goal is to show how little work on-device AI actually requires.

Step 1: Start From the Demo

Before writing code, spend two minutes at https://ai.sprapp.com. Watch the model load, then go offline and see it keep working. This gives you a mental model of the lifecycle you are about to build: fetch once, cache, run locally.

Step 2: Choose Your Model

Pick the smallest model that fits your task. For lightweight autocomplete or classification, eeny 2.0 (999K params, ~1.76MB) is ideal. If you need a bit more range, step up to meeny 2.0 (6.2M ternary params). Smaller is better — it loads faster and uses less memory.

Step 3: Load the WASM Engine

The TinyLM inference engine ships as a WebAssembly module plus a small JavaScript loader. You initialize the engine, point it at the model file, and it handles the SIMD-accelerated inference. Conceptually:

const engine = await TinyLM.init();
const model = await engine.loadModel("eeny-2.0");

The loader fetches the model on first run and reads it from cache afterward.

Step 4: Cache in IndexedDB

You do not need to manage caching by hand — the engine stores the model in IndexedDB after the first download. The practical thing to do is show a one-time loading indicator on first visit, then skip it on later visits because the model is already local. Test by reloading with the network disabled.

Step 5: Run Inference

With the model loaded, calling it is a simple function. For a classification or completion task:

const result = await model.run(userText);

Because everything is local, the call returns quickly and the user's text never leaves the page.

Step 6: Handle the Honest Edge Cases

Tiny models are narrow, so design around their limits. Constrain the input to the task the model was trained for, validate outputs before acting on them, and provide a graceful fallback for inputs outside scope. Do not present a tiny model as an all-purpose assistant — present it as the focused feature it is.

Step 7: Specialize With Fine-Tuning

For better results on your specific data, fine-tune the model with LoRA, either on-device or in the cloud, then ship the adapted weights. A model tuned to your vocabulary will outperform the generic one on your task while staying just as small.

Wrapping Up

That is the whole loop: pick a tiny model, load the WASM engine, let it cache to IndexedDB, run inference locally, and respect the model's limits. The result is a feature that is private by physics, works offline, and costs nothing to serve. For commercial deployment, TinyLM licenses run $19 to $99.

Written byTinyLM Team

smoltext API Quickstart: Your First Compressed Payload

A hands-on tutorial that takes you from zero to a working compress-and-decompress round trip against the smoltext edge API.

2025-08-195 min read

Tutorial

Integrating smoltext Into Your Write Path Without Breaking Reads

A practical migration guide: add compression to new writes while keeping legacy uncompressed records readable throughout the rollout.

2026-02-036 min read

Tutorial

Fine-Tuning Tiny Models With LoRA, On-Device or in the Cloud

Why fine-tuning is the secret weapon of tiny models, and how LoRA lets you specialize a TinyLM model on your own data cheaply.

2025-10-287 min read

Tutorial

On-Device Text Classification With Tiny Models

Classification is one of the most practical jobs for a tiny model. Here is how TinyLM handles sorting, tagging, and routing text entirely on-device.

2026-04-126 min read

← Back to News

Tutorial2025-10-107 min read

Tutorial: Adding an Offline TinyLM Model to Your Web App

A step-by-step walkthrough of loading a TinyLM model in the browser, caching it for offline use, and wiring it into a simple text feature.

TinyLM tutorialbrowser AIon-device AIIndexedDBweb app

What You Will Build

Step 1: Start From the Demo

Step 2: Choose Your Model

Step 3: Load the WASM Engine

const engine = await TinyLM.init();
const model = await engine.loadModel("eeny-2.0");

The loader fetches the model on first run and reads it from cache afterward.

Step 4: Cache in IndexedDB

Step 5: Run Inference

With the model loaded, calling it is a simple function. For a classification or completion task:

const result = await model.run(userText);

Because everything is local, the call returns quickly and the user's text never leaves the page.

Step 6: Handle the Honest Edge Cases

Step 7: Specialize With Fine-Tuning

Wrapping Up

Written byTinyLM Team

smoltext API Quickstart: Your First Compressed Payload

A hands-on tutorial that takes you from zero to a working compress-and-decompress round trip against the smoltext edge API.

2025-08-195 min read

Tutorial

Integrating smoltext Into Your Write Path Without Breaking Reads

A practical migration guide: add compression to new writes while keeping legacy uncompressed records readable throughout the rollout.

2026-02-036 min read

Tutorial

Fine-Tuning Tiny Models With LoRA, On-Device or in the Cloud

Why fine-tuning is the secret weapon of tiny models, and how LoRA lets you specialize a TinyLM model on your own data cheaply.

2025-10-287 min read

Tutorial

On-Device Text Classification With Tiny Models

Classification is one of the most practical jobs for a tiny model. Here is how TinyLM handles sorting, tagging, and routing text entirely on-device.

2026-04-126 min read

← Back to News

Tutorial: Adding an Offline TinyLM Model to Your Web App

What You Will Build

Step 1: Start From the Demo

Step 2: Choose Your Model

Step 3: Load the WASM Engine

Step 4: Cache in IndexedDB

Step 5: Run Inference

Step 6: Handle the Honest Edge Cases

Step 7: Specialize With Fine-Tuning

Wrapping Up

Tags

Related Articles

smoltext API Quickstart: Your First Compressed Payload

Integrating smoltext Into Your Write Path Without Breaking Reads

Fine-Tuning Tiny Models With LoRA, On-Device or in the Cloud

On-Device Text Classification With Tiny Models

Tutorial: Adding an Offline TinyLM Model to Your Web App

What You Will Build

Step 1: Start From the Demo

Step 2: Choose Your Model

Step 3: Load the WASM Engine

Step 4: Cache in IndexedDB

Step 5: Run Inference

Step 6: Handle the Honest Edge Cases

Step 7: Specialize With Fine-Tuning

Wrapping Up

Tags

Related Articles

smoltext API Quickstart: Your First Compressed Payload

Integrating smoltext Into Your Write Path Without Breaking Reads

Fine-Tuning Tiny Models With LoRA, On-Device or in the Cloud

On-Device Text Classification With Tiny Models