How-To2026-04-086 min read

How-To: Route Queries Between TinyLM and SPRAPP Panel

Build a router that handles easy queries on-device with TinyLM and escalates hard ones to SPRAPP Panel.

TinyLMSPRAPP Panelquery routinghow-tohybrid AI

The Pattern

This how-to builds a router that sends routine queries to TinyLM on-device and escalates only the hard, high-stakes ones to a cloud SPRAPP Panel. Done well, it captures most of TinyLM's privacy and cost benefits while keeping Panel's capability available when it's truly needed.

Why Route at All

Sending everything to a cloud panel is expensive and may leak data unnecessarily; running everything on a small local model caps your capability on hard problems. Routing lets each tool do what it's best at. The challenge is deciding, per query, which path to take.

Step 1: Define "Easy" for Your Domain

Start by characterizing the queries TinyLM handles well in your application — short, constrained, classification or extraction style, low stakes. Those stay on-device. Anything outside that envelope is a candidate for escalation. Be concrete; a vague definition produces a bad router.

Step 2: Try TinyLM First

For most queries, run TinyLM on-device first. It's private, has no per-call cost, and works offline. Capture not just its answer but any available signal of confidence or whether the task fit its strengths. The docs at https://doc.sprapp.com cover getting these signals.

Step 3: Decide Whether to Escalate

Escalate to SPRAPP Panel when the query is flagged high-stakes, when TinyLM's output looks low-confidence, or when the task plainly exceeds small-model capability. Keep this rule simple and inspectable — a router you can't reason about is a router you can't tune.

Step 4: Respect Data and Connectivity Constraints

Escalation sends data to the cloud, so gate it on policy and network availability. If the data isn't allowed to leave the device, don't escalate — fall back to TinyLM's best effort and surface the limitation. The privacy promise must survive the router.

Step 5: Call SPRAPP Panel for the Hard Cases

When escalation is warranted and permitted, send the query to Panel for multi-model reasoning. Because only the genuinely hard cases reach it, you're spending the expensive call where it adds the most value.

Step 6: Log Both Paths

Record which path each query took and how it turned out. Review escalations that TinyLM could have handled, and on-device answers that should have escalated. Use that to refine your "easy" definition over time.

Honest Tradeoff

No router is perfect — you'll mis-route some queries either way. The goal isn't perfection; it's spending capability and cloud exposure only where they pay off. Tune the threshold against your real traffic, as documented at https://doc.sprapp.com.