How-To: Route Queries Between TinyLM and SPRAPP Panel
Build a router that handles easy queries on-device with TinyLM and escalates hard ones to SPRAPP Panel.
The Pattern
This how-to builds a router that sends routine queries to TinyLM on-device and escalates only the hard, high-stakes ones to a cloud SPRAPP Panel. Done well, it captures most of TinyLM's privacy and cost benefits while keeping Panel's capability available when it's truly needed.
Why Route at All
Sending everything to a cloud panel is expensive and may leak data unnecessarily; running everything on a small local model caps your capability on hard problems. Routing lets each tool do what it's best at. The challenge is deciding, per query, which path to take.
Step 1: Define "Easy" for Your Domain
Start by characterizing the queries TinyLM handles well in your application — short, constrained, classification or extraction style, low stakes. Those stay on-device. Anything outside that envelope is a candidate for escalation. Be concrete; a vague definition produces a bad router.
Step 2: Try TinyLM First
For most queries, run TinyLM on-device first. It's private, has no per-call cost, and works offline. Capture not just its answer but any available signal of confidence or whether the task fit its strengths. The docs at https://doc.sprapp.com cover getting these signals.
Step 3: Decide Whether to Escalate
Escalate to SPRAPP Panel when the query is flagged high-stakes, when TinyLM's output looks low-confidence, or when the task plainly exceeds small-model capability. Keep this rule simple and inspectable — a router you can't reason about is a router you can't tune.
Step 4: Respect Data and Connectivity Constraints
Escalation sends data to the cloud, so gate it on policy and network availability. If the data isn't allowed to leave the device, don't escalate — fall back to TinyLM's best effort and surface the limitation. The privacy promise must survive the router.
Step 5: Call SPRAPP Panel for the Hard Cases
When escalation is warranted and permitted, send the query to Panel for multi-model reasoning. Because only the genuinely hard cases reach it, you're spending the expensive call where it adds the most value.
Step 6: Log Both Paths
Record which path each query took and how it turned out. Review escalations that TinyLM could have handled, and on-device answers that should have escalated. Use that to refine your "easy" definition over time.
Honest Tradeoff
No router is perfect — you'll mis-route some queries either way. The goal isn't perfection; it's spending capability and cloud exposure only where they pay off. Tune the threshold against your real traffic, as documented at https://doc.sprapp.com.
Result
You have a stack that's private and cheap by default and powerful when it counts, with a routing rule you can inspect and improve.