LLM Providers
Core files: assistant/multillm.go (initialization), assistant/assutils.go (model list, constants)
The assistant framework supports 10 LLM providers plus a direct Perplexity search client. All providers implement the LLMClient interface and are registered in InitLLMClients().
Provider Registry
| # | Provider Key | Client File | Timeout | Context | Notes |
|---|---|---|---|---|---|
| 1 | ollama |
ollama_client.go |
45 min | Varies | Legacy local models. Shares semaphore with llama. |
| 2 | llama |
llama_client.go |
45 min | Model-specific | llama.cpp server at AI.dLAN:11434. Shares Ollama semaphore. |
| 3 | google |
gemini_client.go |
3 min | Model-specific | Gemini 2.5-pro, 2.5-flash, etc. |
| 4 | anthropic |
anthropic_client.go |
45 min | Model-specific | Claude 3.5 Sonnet, 3 Opus, etc. |
| 5 | grok |
grok_client.go |
5 min | ~128k | grok-3, grok-4.3, grok-4-0709, grok-4.1-fast |
| 6 | kimi |
kimi_k2.go |
2 min | ~200k | kimi-k2, kimi-k2-thinking |
| 7 | deepseek |
deepseek_client.go |
2 min | ~64k | deepseek-chat, deepseek-coder |
| 8 | openrouter |
openrouter_client.go |
30 min | Dynamic | Gateway to 90+ models. Special features below. |
| 9 | inception |
inception_client.go |
10 min | 128k | Inception Labs API |
| 10 | zai |
zai_client.go |
30 min | 128k | ZAI.ai API. Only initialized if ZAI_API_KEY env var set. |
| 11 | nvidia |
nvidia_client.go |
45 min | Varies | NVIDIA NIM (build.nvidia.com) OpenAI-compatible API. |
Perplexity (perplexity_client.go) is NOT in the llmClients map. It's instantiated directly in search_tools.go for web search queries.
Model Identifier Format
All models use provider/model-name format (e.g., llama/qwen3:30b, openrouter/z-ai/glm-5, nvidia/nvidia/zai/glm-5).
The llama/ prefix has replaced ollama/ for local models. All locally-hosted models (previously ollama/*) are now served via llama.cpp server and use the llama/ prefix.
Model Constants
| Constant | Value | Location | Purpose |
|---|---|---|---|
DefaultModel |
llama/llama3.2:latest |
assutils.go:30 | Fallback when no model specified |
DefaultHighEndModel |
nvidia/z-ai/glm-5 |
assutils.go:31 | System-wide high-intelligence model |
FallbackFreeModel |
nvidia/z-ai/glm-5 |
assutils.go:32 | Free tier fallback |
FallbackFlashModel |
llama/qwen3:8b |
assutils.go:33 | Flash model fallback |
SummaryModel |
openrouter/z-ai/glm-4.5-air:free |
assutils.go:34 | Conversation summarization |
TranslationModel |
openrouter/z-ai/glm-4.5-air:free |
assutils.go:35 | Translation tasks |
RefinementModel |
openrouter/z-ai/glm-4.5-air:free |
assutils.go:36 | Response refinement |
STTModel |
google/gemini-2.5-flash |
assutils.go:37 | Speech-to-text |
Context Overflow Fallback (May 18, 2026)
File: assistant/llama_client.go — ErrContextOverflow sentinel
When llama-server returns a 400 error indicating context length exceeded, the ReAct loop and Single-Shot paths now detect this and trigger a fallback chain:
- Current model fails with context overflow
- → Fall back to
gemini-2.5-flash-lite - Circuit breaker logic at the 400/402/429 level handles all error types uniformly
This was added in commit 41536d47 to prevent model deadlocks when long conversations exhaust context windows.
OpenRouter Special Features
The OpenRouter client has several features not present in other providers:
- Request Jitter: 100-400ms random delay to prevent thundering-herd 429 errors
- Free Model Detection:
isFreeModel()identifies free-tier models and logs warnings when used with tools - Tool-Aware Fallback:
GetFreeModelsFallbackChainToolAware()filters out "flash" models when tools are present (flash models fail tool calls) - 400 Error Retry: If a 400 error mentions
tool_choice, retries withouttool_choice - 429 Handling: Reads
Retry-Afterheader, progressive backoff (10s, 30s, 60s + jitter) - Cache Metrics: Tracks
prompt_tokens_details.cache_hitandcache_missfrom OpenRouter responses - Cost Tracking: Extracts
usage.costfrom response JSON - Prompt Caching: OpenRouter prompt caching enabled (May 9, 2026) — reduces token costs for long conversations
Free Models System
File: assistant/free-models.go (391 lines)
The FreeModelsProvider manages a JSON config (free-models.json) that controls model selection for different scenarios:
| Function | Purpose |
|---|---|
GetInterimFreeModel() |
Best model for interim/streaming responses |
GetGreetingFreeModel() |
Best model for greeting messages |
GetFreeModelsFallbackChain() |
Priority-sorted fallback chain (local models first) |
GetFreeModelsFallbackChainToolAware(hasTools) |
Fallback chain excluding "flash" models when tools needed |
CRUD Operations: AddModel(), UpdateModel(), RemoveModel(), ReorderModels() — all thread-safe with sync.RWMutex.
Model Caching
GetAvailableModels() caches the model list for 3 hours (modelCacheTTL). On first call, it seeds synchronously from the static availableModels slice, then refreshes in the background by querying Ollama, llama-server, OpenRouter, Google, and Anthropic endpoints.
See also: Search Tools, Assistant Framework