Supported models

Index speaks to two providers today: the Claude API from Anthropic and any Ollama server running on your machine or network. Mix and match — use a small local model as the executor and call out to a frontier Claude model as the advisor when the reasoning gets hard.

Claude API

Set ANTHROPIC_API_KEY in your environment and reference any of these IDs in your agent config.

Current

Model	API ID	Context	Best for
Claude Opus 4.7	`claude-opus-4-7`	1M	Most capable. Complex reasoning, agentic coding.
Claude Sonnet 4.6	`claude-sonnet-4-6`	1M	The default advisor. Fast and smart.
Claude Haiku 4.5	`claude-haiku-4-5`	200k	The default executor. Near-frontier quality at Haiku speed.

Legacy (still available)

Model	API ID	Context
Claude Opus 4.6	`claude-opus-4-6`	1M
Claude Opus 4.5	`claude-opus-4-5`	200k
Claude Opus 4.1	`claude-opus-4-1`	200k
Claude Sonnet 4.5	`claude-sonnet-4-5`	200k
Claude Sonnet 4	`claude-sonnet-4-0`	200k
Claude Opus 4	`claude-opus-4-0`	200k
Claude Haiku 3	`claude-3-haiku-20240307`	200k

Claude Sonnet 4 and Opus 4 retire on June 15, 2026. Claude Haiku 3 retires on April 19, 2026. Migrate before those dates. See the Anthropic model overview for pricing and deprecation detail.

Ollama

Point Index at an Ollama server (OLLAMA_HOST, defaults to http://localhost:11434) and use ollama pull <name> to fetch anything below. Sizes listed are parameter counts — pick what fits in your VRAM.

General purpose

Model	Sizes	Notes
`llama3.3`	70b	Meta's high-performance flagship.
`llama3.1`	8b, 70b, 405b	State-of-the-art Llama generation.
`llama3.2`	1b, 3b	Compact Llama for laptops.
`llama3`	8b, 70b	Previous Llama generation, still strong.
`llama2`	7b, 13b, 70b	Foundation models from Meta.
`qwen3`	0.6b–235b (dense + MoE)	Latest Qwen. Wide size range.
`qwen3.5`	0.8b–122b	Multimodal Qwen.
`qwen2.5`	0.5b–72b	Multilingual Alibaba model.
`qwen2`	0.5b, 1.5b, 7b, 72b	Previous Qwen generation.
`qwen`	0.5b–110b	Original Qwen family.
`gemma4`	26b, 31b	Google's frontier reasoning models.
`gemma3`	270m, 1b, 4b, 12b, 27b	Single-GPU Gemma.
`gemma2`	2b, 9b, 27b	Efficient Gemma.
`gemma`	2b, 7b	Original DeepMind Gemma.
`mistral`	7b	Mistral 7B v0.3.
`mistral-nemo`	12b	128k context.
`mistral-small`	22b, 24b	Small-model benchmark leader.
`phi4`	14b	Microsoft state-of-the-art compact model.
`phi3`	3.8b, 14b	Lightweight Microsoft models.
`olmo2`	7b, 13b	Open language models.
`dolphin3`	8b	Dolphin instruct-tuned.

Reasoning

Model	Sizes	Notes
`deepseek-r1`	1.5b–671b	Open reasoning with strong benchmarks.
`deepseek-v3`	671b	Large MoE language model.
`gpt-oss`	20b, 120b	OpenAI open-weight reasoning models.

Coding

Model	Sizes	Notes
`qwen3-coder`	30b, 480b	Long-context coding.
`qwen2.5-coder`	0.5b–32b	Code-focused Qwen 2.5.
`codellama`	7b, 13b, 34b, 70b	Meta code generation.
`deepseek-coder`	1.3b, 6.7b, 33b	Two-trillion-token coding corpus.
`codegemma`	2b, 7b	Code-tuned Gemma.

Vision and multimodal

Model	Sizes
`qwen3-vl`	2b, 4b, 8b, 30b, 32b, 235b
`llama3.2-vision`	11b, 90b
`llava`	7b, 13b, 34b
`minicpm-v`	8b

Embeddings

Model	Sizes
`nomic-embed-text`	default
`mxbai-embed-large`	335m
`bge-m3`	567m
`snowflake-arctic-embed`	22m, 33m, 110m, 137m, 335m
`all-minilm`	22m, 33m

Compact (runs on modest hardware)

Model	Sizes
`smollm2`	135m, 360m, 1.7b
`tinyllama`	1.1b
`gemma3`	270m, 1b

The full catalog lives at ollama.com/library and changes frequently — check there for new drops.

Suggested pairings

A good starting config is a cheap executor plus a capable advisor so most tokens stay local or cheap and only the hard turns escalate.

Cheapest cloud: claude-haiku-4-5 executor, claude-sonnet-4-6 advisor.
Frontier coding: claude-sonnet-4-6 executor, claude-opus-4-7 advisor.
Local-first: qwen2.5-coder:7b executor, claude-sonnet-4-6 advisor.
Fully local: llama3.1:8b executor, deepseek-r1:32b advisor.