Index speaks to two providers today: the Claude API from Anthropic and any Ollama server running on your machine or network. Mix and match — use a small local model as the executor and call out to a frontier Claude model as the advisor when the reasoning gets hard.
Claude API
Set ANTHROPIC_API_KEY in your environment and reference any of these IDs in your agent config.
Current
| Model |
API ID |
Context |
Best for |
| Claude Opus 4.7 |
claude-opus-4-7 |
1M |
Most capable. Complex reasoning, agentic coding. |
| Claude Sonnet 4.6 |
claude-sonnet-4-6 |
1M |
The default advisor. Fast and smart. |
| Claude Haiku 4.5 |
claude-haiku-4-5 |
200k |
The default executor. Near-frontier quality at Haiku speed. |
Legacy (still available)
| Model |
API ID |
Context |
| Claude Opus 4.6 |
claude-opus-4-6 |
1M |
| Claude Opus 4.5 |
claude-opus-4-5 |
200k |
| Claude Opus 4.1 |
claude-opus-4-1 |
200k |
| Claude Sonnet 4.5 |
claude-sonnet-4-5 |
200k |
| Claude Sonnet 4 |
claude-sonnet-4-0 |
200k |
| Claude Opus 4 |
claude-opus-4-0 |
200k |
| Claude Haiku 3 |
claude-3-haiku-20240307 |
200k |
Claude Sonnet 4 and Opus 4 retire on June 15, 2026. Claude Haiku 3 retires on April 19, 2026. Migrate before those dates. See the Anthropic model overview for pricing and deprecation detail.
Ollama
Point Index at an Ollama server (OLLAMA_HOST, defaults to http://localhost:11434) and use ollama pull <name> to fetch anything below. Sizes listed are parameter counts — pick what fits in your VRAM.
General purpose
| Model |
Sizes |
Notes |
llama3.3 |
70b |
Meta's high-performance flagship. |
llama3.1 |
8b, 70b, 405b |
State-of-the-art Llama generation. |
llama3.2 |
1b, 3b |
Compact Llama for laptops. |
llama3 |
8b, 70b |
Previous Llama generation, still strong. |
llama2 |
7b, 13b, 70b |
Foundation models from Meta. |
qwen3 |
0.6b–235b (dense + MoE) |
Latest Qwen. Wide size range. |
qwen3.5 |
0.8b–122b |
Multimodal Qwen. |
qwen2.5 |
0.5b–72b |
Multilingual Alibaba model. |
qwen2 |
0.5b, 1.5b, 7b, 72b |
Previous Qwen generation. |
qwen |
0.5b–110b |
Original Qwen family. |
gemma4 |
26b, 31b |
Google's frontier reasoning models. |
gemma3 |
270m, 1b, 4b, 12b, 27b |
Single-GPU Gemma. |
gemma2 |
2b, 9b, 27b |
Efficient Gemma. |
gemma |
2b, 7b |
Original DeepMind Gemma. |
mistral |
7b |
Mistral 7B v0.3. |
mistral-nemo |
12b |
128k context. |
mistral-small |
22b, 24b |
Small-model benchmark leader. |
phi4 |
14b |
Microsoft state-of-the-art compact model. |
phi3 |
3.8b, 14b |
Lightweight Microsoft models. |
olmo2 |
7b, 13b |
Open language models. |
dolphin3 |
8b |
Dolphin instruct-tuned. |
Reasoning
| Model |
Sizes |
Notes |
deepseek-r1 |
1.5b–671b |
Open reasoning with strong benchmarks. |
deepseek-v3 |
671b |
Large MoE language model. |
gpt-oss |
20b, 120b |
OpenAI open-weight reasoning models. |
Coding
| Model |
Sizes |
Notes |
qwen3-coder |
30b, 480b |
Long-context coding. |
qwen2.5-coder |
0.5b–32b |
Code-focused Qwen 2.5. |
codellama |
7b, 13b, 34b, 70b |
Meta code generation. |
deepseek-coder |
1.3b, 6.7b, 33b |
Two-trillion-token coding corpus. |
codegemma |
2b, 7b |
Code-tuned Gemma. |
Vision and multimodal
| Model |
Sizes |
qwen3-vl |
2b, 4b, 8b, 30b, 32b, 235b |
llama3.2-vision |
11b, 90b |
llava |
7b, 13b, 34b |
minicpm-v |
8b |
Embeddings
| Model |
Sizes |
nomic-embed-text |
default |
mxbai-embed-large |
335m |
bge-m3 |
567m |
snowflake-arctic-embed |
22m, 33m, 110m, 137m, 335m |
all-minilm |
22m, 33m |
Compact (runs on modest hardware)
| Model |
Sizes |
smollm2 |
135m, 360m, 1.7b |
tinyllama |
1.1b |
gemma3 |
270m, 1b |
The full catalog lives at ollama.com/library and changes frequently — check there for new drops.
Suggested pairings
A good starting config is a cheap executor plus a capable advisor so most tokens stay local or cheap and only the hard turns escalate.
- Cheapest cloud:
claude-haiku-4-5 executor, claude-sonnet-4-6 advisor.
- Frontier coding:
claude-sonnet-4-6 executor, claude-opus-4-7 advisor.
- Local-first:
qwen2.5-coder:7b executor, claude-sonnet-4-6 advisor.
- Fully local:
llama3.1:8b executor, deepseek-r1:32b advisor.