Technical Deep Dive2025-07-186 min read

Running Language Models in the Browser with Rust and WebAssembly

How TinyLM uses a Rust to WebAssembly inference engine to run real language models inside any phone browser — no install, no plugin, no server.

WebAssemblyRustbrowser inferenceTinyLMWASM SIMD

The Browser Is the Runtime

For decades, running a language model meant a server or at least a native app. TinyLM treats the browser itself as the runtime. The model and the inference engine both live in a web page, which means there is nothing to install and nothing to trust beyond the page you already opened.

Why Rust to WebAssembly

The TinyLM inference engine is written in Rust and compiled to WebAssembly (WASM). Rust gives us tight control over memory and predictable performance; WASM gives us a portable, sandboxed target that runs in every modern browser. The combination means one engine runs identically on Android Chrome, iOS Safari, and a desktop browser.

SIMD for Speed

Plain WASM is fast, but phone CPUs have vector units that can do many operations at once. The engine uses WASM SIMD to process int8 weights in parallel, which is the difference between sluggish and responsive on a mid-range phone. SIMD is what makes a several-million-parameter model usable without a GPU.

Ternary Weights, Smaller Footprint

The 2.0 models like meeny use 1.58-bit ternary weights, where each weight is one of three values. This shrinks the model dramatically and turns many multiplications into additions and subtractions, which are cheaper on a CPU. eeny 2.0 fits in about 1.76MB — smaller than many web images.

Caching in IndexedDB

The first load fetches the model file over the network. After that, TinyLM stores it in IndexedDB, the browser's built-in database. On every subsequent visit the engine reads the model from local storage, so startup is fast and works with no connection. You can see this yourself at https://ai.sprapp.com: load once, then reload offline.

The Sandbox Is a Feature

Because everything runs in the browser sandbox, the engine cannot read your files, access other tabs, or phone home. The same isolation that protects you from malicious web pages protects your data from the model. There is no API endpoint to leak to because there is no API endpoint.

Honest Limits of the Browser

Browsers are not infinite. Memory is constrained, WASM has overhead compared to native code, and very large models simply will not fit. This is exactly why TinyLM is tiny — the constraints of the browser shaped the model family, not the other way around. We optimized the model to fit the runtime instead of fighting it.

A Practical Takeaway

If you are building a web app that needs instant, private text intelligence, the browser-plus-WASM approach removes whole categories of problems: no backend to scale, no per-request cost, no privacy review for data in transit. TinyLM packages that approach so you can drop a tiny model into a page and have it just work.

Written bySPRAPP Research