Index speaks to two providers today: the Claude API from Anthropic and any Ollama server running on your machine or network. Mix and match — use a small local model as the executor and call out to a frontier Claude model as the advisor when the reasoning gets hard.

Claude API

Set ANTHROPIC_API_KEY in your environment and reference any of these IDs in your agent config.

Current

Model API ID Context Best for
Claude Opus 4.7 claude-opus-4-7 1M Most capable. Complex reasoning, agentic coding.
Claude Sonnet 4.6 claude-sonnet-4-6 1M The default advisor. Fast and smart.
Claude Haiku 4.5 claude-haiku-4-5 200k The default executor. Near-frontier quality at Haiku speed.

Legacy (still available)

Model API ID Context
Claude Opus 4.6 claude-opus-4-6 1M
Claude Opus 4.5 claude-opus-4-5 200k
Claude Opus 4.1 claude-opus-4-1 200k
Claude Sonnet 4.5 claude-sonnet-4-5 200k
Claude Sonnet 4 claude-sonnet-4-0 200k
Claude Opus 4 claude-opus-4-0 200k
Claude Haiku 3 claude-3-haiku-20240307 200k

Claude Sonnet 4 and Opus 4 retire on June 15, 2026. Claude Haiku 3 retires on April 19, 2026. Migrate before those dates. See the Anthropic model overview for pricing and deprecation detail.

Ollama

Point Index at an Ollama server (OLLAMA_HOST, defaults to http://localhost:11434) and use ollama pull <name> to fetch anything below. Sizes listed are parameter counts — pick what fits in your VRAM.

General purpose

Model Sizes Notes
llama3.3 70b Meta's high-performance flagship.
llama3.1 8b, 70b, 405b State-of-the-art Llama generation.
llama3.2 1b, 3b Compact Llama for laptops.
llama3 8b, 70b Previous Llama generation, still strong.
llama2 7b, 13b, 70b Foundation models from Meta.
qwen3 0.6b–235b (dense + MoE) Latest Qwen. Wide size range.
qwen3.5 0.8b–122b Multimodal Qwen.
qwen2.5 0.5b–72b Multilingual Alibaba model.
qwen2 0.5b, 1.5b, 7b, 72b Previous Qwen generation.
qwen 0.5b–110b Original Qwen family.
gemma4 26b, 31b Google's frontier reasoning models.
gemma3 270m, 1b, 4b, 12b, 27b Single-GPU Gemma.
gemma2 2b, 9b, 27b Efficient Gemma.
gemma 2b, 7b Original DeepMind Gemma.
mistral 7b Mistral 7B v0.3.
mistral-nemo 12b 128k context.
mistral-small 22b, 24b Small-model benchmark leader.
phi4 14b Microsoft state-of-the-art compact model.
phi3 3.8b, 14b Lightweight Microsoft models.
olmo2 7b, 13b Open language models.
dolphin3 8b Dolphin instruct-tuned.

Reasoning

Model Sizes Notes
deepseek-r1 1.5b–671b Open reasoning with strong benchmarks.
deepseek-v3 671b Large MoE language model.
gpt-oss 20b, 120b OpenAI open-weight reasoning models.

Coding

Model Sizes Notes
qwen3-coder 30b, 480b Long-context coding.
qwen2.5-coder 0.5b–32b Code-focused Qwen 2.5.
codellama 7b, 13b, 34b, 70b Meta code generation.
deepseek-coder 1.3b, 6.7b, 33b Two-trillion-token coding corpus.
codegemma 2b, 7b Code-tuned Gemma.

Vision and multimodal

Model Sizes
qwen3-vl 2b, 4b, 8b, 30b, 32b, 235b
llama3.2-vision 11b, 90b
llava 7b, 13b, 34b
minicpm-v 8b

Embeddings

Model Sizes
nomic-embed-text default
mxbai-embed-large 335m
bge-m3 567m
snowflake-arctic-embed 22m, 33m, 110m, 137m, 335m
all-minilm 22m, 33m

Compact (runs on modest hardware)

Model Sizes
smollm2 135m, 360m, 1.7b
tinyllama 1.1b
gemma3 270m, 1b

The full catalog lives at ollama.com/library and changes frequently — check there for new drops.

Suggested pairings

A good starting config is a cheap executor plus a capable advisor so most tokens stay local or cheap and only the hard turns escalate.

  • Cheapest cloud: claude-haiku-4-5 executor, claude-sonnet-4-6 advisor.
  • Frontier coding: claude-sonnet-4-6 executor, claude-opus-4-7 advisor.
  • Local-first: qwen2.5-coder:7b executor, claude-sonnet-4-6 advisor.
  • Fully local: llama3.1:8b executor, deepseek-r1:32b advisor.