{"version":"https://jsonfeed.org/version/1","title":"Index — Multi-agent AI runtime for your terminal","description":"Index is a terminal-native multi-agent runtime that pairs a cheap executor with a frontier advisor. Ship more, spend less — bring your own Claude API key or run fully local with Ollama.","home_page_url":"https://index.rkrt.net","feed_url":"https://index.rkrt.net/feed.json","author":{"name":"Tyler Reckart","url":"https://index.rkrt.net"},"items":[{"id":"https://index.rkrt.net/posts/escalation-is-a-tool","url":"https://index.rkrt.net/posts/escalation-is-a-tool","title":"Escalation is a tool, not a trick","content_html":"<p>Most of what a coding agent does is boring.</p>\n<p>Read a file. List a directory. Grep for a symbol. Parse the output of a build command. Eight or ten turns of that, and maybe one where the agent actually has to think — should this be a class or a free function, should this function be idempotent, why does the test pass on CI and fail locally. The easy turns vastly outnumber the hard ones.</p>\n<p>The bill doesn&#39;t know the difference. Every turn is priced the same. A Claude Opus token spent parsing <code>ls</code> costs the same as a Claude Opus token spent designing a migration. That&#39;s the unspoken pricing of running agents in production, and it&#39;s the source of a specific kind of pain — watching a meter tick up while the model does work a seven-billion-parameter model on a laptop could have done just as well.</p>\n<p>The obvious fix is to swap the smart model for a cheap one. The obvious failure mode is then noticing, a week later, that the cheap model occasionally ships bad code. Not often. Just often enough to erode trust. Switch back to the smart model and the meter resumes.</p>\n<p><a href=\"https://github.com/tylerreckart/index\">Index</a> is a terminal-native multi-agent runtime built around the opposite bet: most of the work should run on a cheap executor, and only the hard turns should reach for a frontier model. That part isn&#39;t novel — plenty of systems route between model tiers. The part worth talking about is <em>how</em> the escalation happens. Index makes it a tool.</p>\n<h2 id=\"how-it-works\">How it works</h2>\n<p>An agent in Index is defined by a JSON file. A research agent looks like this:</p>\n<pre><code class=\"language-json\">{\n  <span class=\"hljs-attr\">&quot;name&quot;</span>: <span class=\"hljs-string\">&quot;research&quot;</span>,\n  <span class=\"hljs-attr\">&quot;role&quot;</span>: <span class=\"hljs-string\">&quot;research-analyst&quot;</span>,\n  <span class=\"hljs-attr\">&quot;model&quot;</span>: <span class=\"hljs-string\">&quot;ollama/qwen2.5-coder:7b&quot;</span>,\n  <span class=\"hljs-attr\">&quot;advisor_model&quot;</span>: <span class=\"hljs-string\">&quot;claude-opus-4-6&quot;</span>,\n  <span class=\"hljs-attr\">&quot;goal&quot;</span>: <span class=\"hljs-string\">&quot;Research topics with depth. Synthesize findings.&quot;</span>\n}\n</code></pre>\n<p><code>model</code> is who does the work. <code>advisor_model</code> is who gets consulted when the work gets hard. The executor&#39;s system prompt describes the advisor, when to use it — ambiguous tradeoffs, multi-step planning, decisions the executor doesn&#39;t feel confident making — and a budget, typically two consults per turn.</p>\n<p>When the executor decides it needs help, it writes this into its own response:</p>\n<pre><code>/advise Should these two API responses be treated as duplicates or merged?\nThe fields overlap 80% but the &quot;source&quot; field differs. Dedupe on\nrequest_id or on payload hash?\n</code></pre>\n<p>The runtime parses that line out of the response, builds a one-shot API call to the advisor model, and returns the answer as a tool result in the next turn. The executor reads the answer and proceeds. The advisor&#39;s tokens are billed at the advisor&#39;s rate but attributed to the executor&#39;s ledger, so <code>/tokens</code> shows exactly how much of each agent&#39;s spend went to its own model versus the seniors it consulted.</p>\n<p>That&#39;s the mechanic. The design choice hiding inside it is the part worth writing about.</p>\n<h2 id=\"why-this-isnt-automatic\">Why this isn&#39;t automatic</h2>\n<p>Most systems make escalation a decision the <em>framework</em> makes for the agent. A confidence score triggers a fallback. An uncertainty classifier routes the request. A learned dispatcher between the caller and the agent decides when to bring in the smart one. The agent never knows it was second-opinioned, and the user never knows why the decision was made.</p>\n<p>That&#39;s convenient, and it feels like progress — who wouldn&#39;t want less work? — but it buys a specific kind of debt. Escalation isn&#39;t debuggable, which means it can&#39;t be tuned, which means the framework either over-escalates (paying for a router call on every turn) or under-escalates (missing cases the router couldn&#39;t classify). And because the framework does the deciding, the executor never has to notice when it&#39;s in over its head. The router does the noticing.</p>\n<p>Making escalation a tool puts the noticing back on the agent. The executor has to read its own reasoning and decide: <em>am I confident enough to proceed, or is this the kind of thing to ask about?</em> That&#39;s a harder prompt to write. It&#39;s also a more honest one. Small models tend to over-consult, large models tend to under-consult, and most real agents need a system prompt that nudges them to do both well.</p>\n<p>The reward for putting the work in is that escalations are explicit and debuggable. A consultation in the transcript means the executor chose to ask. No consultation means it didn&#39;t feel the need. The behavior is legible in a way a hidden router can&#39;t be.</p>\n<h2 id=\"why-the-advisor-has-no-memory\">Why the advisor has no memory</h2>\n<p>The other decision worth naming: the advisor sees only the question text. No conversation history, no file context, no prior tool results. The API call is literally a single user message with a terse system prompt asking for a direct answer, no preamble.</p>\n<p>That sounds like a limitation. It&#39;s the single best thing about the pattern.</p>\n<p>Forcing the executor to frame its question as a self-contained one-shot produces two effects. First, the executor has to articulate the decision in full — the tradeoff, the constraints, why it&#39;s stuck. Articulation is often where the answer becomes obvious; a nontrivial fraction of consultations are never sent because the executor realizes mid-framing that it already knows what to do. Second, the advisor call stays small. One user turn, one response, maybe 1500 tokens end to end. It caches well, costs less than a single turn of a long-context conversation, and doesn&#39;t pollute the advisor with the executor&#39;s working memory.</p>\n<p>History-lessness is a constraint that produces better questions. Nothing about the design planned for it to work that way. It just does.</p>\n<h2 id=\"the-part-the-system-doesnt-hide\">The part the system doesn&#39;t hide</h2>\n<p>The routing layer — the thing that decides which <em>agent</em> (not which model) handles a user&#39;s request — is embarrassingly low-tech. A master agent called <code>index</code> receives every incoming request, reads a roster of the specialist agents in the current session along with their roles and goals, and follows rules like these in its system prompt:</p>\n<pre><code>Route based on what is being requested:\n  - Research, facts, URLs → /agent research\n  - Code review → /agent reviewer\n  - Essays, docs, reports → /agent writer\n  - Shell, git, Docker, infra → /agent devops\n</code></pre>\n<p>That&#39;s the entire router. No embeddings, no classifier, no learned dispatcher. The master reads the rules and the roster and picks an agent by writing <code>/agent research &lt;task&gt;</code> into its response — exactly the same way the executor writes <code>/advise</code>.</p>\n<p>The obvious objection is that this should be smarter. The counter is that routing wrong in ways a developer can read beats routing right in ways no one can debug. When a task goes to the wrong agent, the reason sits on the screen. Changing how routing works is editing a text file.</p>\n<p>Delegation is also capped at two levels. A user&#39;s request reaches the master, the master can delegate once, and the specialist can delegate once more to a sub-specialist. Any attempt deeper returns an error string to the caller, which the agent has to handle. This prevents the long tail of multi-agent horror stories where something loops itself into bankruptcy. It also means agents occasionally have to punt work back to the user, which is the point. A shallow, legible tree beats a deep opaque one.</p>\n<h2 id=\"the-bill\">The bill</h2>\n<p>After a typical session, <code>/tokens</code> prints something like this:</p>\n<pre><code>index        claude-sonnet-4-6     $0.08   (42k input, 3k output)\nresearch     ollama/qwen-7b        $0.00   (local)\n  advisor → claude-opus-4-6        $0.03   (2 consultations)\nreviewer     claude-sonnet-4-6     $0.04   (18k input, 2k output)\n</code></pre>\n<p>The research agent did most of the heavy lifting. It ran locally and cost nothing. When it needed judgment, it consulted Opus twice, for three cents total. The reviewer ran at Sonnet rates on work that actually needed Sonnet. The master just did routing.</p>\n<p>This isn&#39;t the only way to build a multi-agent runtime, and it isn&#39;t always the right way. It&#39;s a version a developer can reason about. When something goes wrong, the transcript shows the decision the agent made. When a bill comes in higher than expected, the ledger shows which model consumed the budget and why. When an escalation feels wrong, tuning the advisor&#39;s system prompt changes the rate.</p>\n<p>Automatic routing solves a real problem. It also solves that problem by hiding the thing most worth understanding. The trade Index makes is to leave the escalation visible, put it in the agent&#39;s own hands, and pay the occasional tax of writing a prompt that teaches a model to know when to ask for help.</p>\n<p>Which is what a senior engineer does anyway.</p>\n"}]}