Multimodal & Free Model Routing

Multimodal Support

Files: assistant/multimodal.go (74 lines), assistant/multimodal_query.go (251 lines)

The multimodal system enables agents to process and generate content containing text, images, and audio through a unified interface.

Content Types

Type	Constant	Usage
Text	`ContentTypeText`	Plain text messages
Image	`ContentTypeImage`	Image via URL (`ImageURL` struct)
Audio	`ContentTypeAudio`	Audio via raw bytes (`AudioData` struct with format + data)

Key Structs

type ContentPart struct {
    Type     ContentType `json:"type"`
    Text     string      `json:"text,omitempty"`
    ImageURL *ImageURL   `json:"image_url,omitempty"`
    Audio    *AudioData  `json:"audio,omitempty"`
}

Helper constructors: NewTextPart(), NewImagePart(url), NewAudioPart(format, data)

MultimodalQuery

MultimodalQuery(opts) sends multimodal content to OpenRouter's chat completions API:

Default model: moonshotai/kimi-k2.5 (for images)
Timeout: 120 seconds
Encoding: Images are sent as URLs; audio is base64-encoded

QueryMultimodal() is the high-level wrapper that:

Loads agent config to get model assignment
Default model for images: moonshotai/kimi-k2.5
Encodes media content as base64 where needed
Calls MultimodalQuery
Persists to chat history if ChatID is provided

Free Model Routing

File: assistant/free-models.go (391 lines)

The FreeModelsProvider manages a JSON config (free-models.json) that controls model selection for free-tier, greeting, and fallback scenarios.

Free Model Struct

{
  "id": "glm45-air",
  "name": "GLM 4.5 Air",
  "model": "openrouter/z-ai/glm-4.5-air:free",
  "provider": "openrouter",
  "priority": 10,
  "use_for_interim": true,
  "use_for_greeting": true,
  "use_for_fallback": true,
  "is_local": false,
  "description": "Free-tier model for greetings and interim responses"
}

Routing Logic

Scenario	Function	Fallback
Interim responses	`GetInterimFreeModel()`	`GreetingModel` constant
Greeting messages	`GetGreetingFreeModel()`	`GreetingModel` constant
Fallback chain	`GetFreeModelsFallbackChain()`	`[FallbackFreeModel, FallbackFlashModel]`
Tool-aware fallback	`GetFreeModelsFallbackChainToolAware(hasTools)`	Filters "flash" models when tools present

Tool-Aware Filtering

When hasTools=true, the function excludes models with "flash" in their name from the fallback chain. Flash models (e.g., gemini-2.5-flash) typically fail to return tool calls reliably. If filtering removes all models, the original chain is returned as-is.

Thread Safety

All operations use sync.RWMutex for concurrent access. The config is loaded from free-models.json with lazy initialization and 3-hour caching.

See also: LLM Providers