AntFarm Swarm Architecture
The AntFarm Swarm (antfarm.go, ant_tools.go, projects.go) is a decentralized, high-concurrency worker engine designed to execute large sets of independent tasks in parallel.
🐜 Concept: The "Swarm"
An AntFarm Swarm consists of many "Ant" workers (ephemeral agents) that collaborate to solve a complex problem. This is primarily used for tasks like:
- Massive Infrastructure Audits: Checking health on dozens of servers simultaneously.
- Large-Scale Data Scraping: Gathering information from multiple sources at once.
- Batch Code Analysis: Reviewing or refactoring multiple files in parallel.
📁 Key Components
antfarm.go
The primary swarm coordinator.
SwarmSession: Manages the lifecycle of a swarm, including task distribution and result aggregation.ProjectWorkerSession: An individual, ephemeral agent instance that picks up tasks from the swarm queue.- Concurrency Control: Uses Go channels and
sync.WaitGroupto manage parallel execution without overloading the system. - Automatic Checkpointing: Snapshots the project state at the beginning and end of a swarm operation for safety.
ant_tools.go
A collection of specialized tools available only to Ant workers:
ant_execute_task: The core tool for performing a discrete task.ant_report_discovery: Allows an Ant to report important findings back to the swarm coordinator.ant_request_help: Enables one Ant to delegate a sub-task to another Ant within the same swarm.
projects.go
Project management for swarms — tracks open projects, budgets, milestones, and spawns new ants with skill context.
🐜 Worker Types
Each worker type has specific iteration limits and purposes:
| Type | Iterations | Purpose |
|------|------------|---------||
| queen | Unlimited | Swarm coordinator, spawns ants |
| worker | 3 | General purpose coding |
| scout | 2 | Discovery/research |
| soldier | 3 | QA/verification |
| debugger | 5 | Root cause analysis |
| janitor | 2 | Cleanup/removal |
Ralph Loop: Self-Correction Mechanism
The Ralph Loop is a self-correction mechanism built into swarm workers:
- Worker completes a task → reports success or failure
- Soldier verifies → checks the work meets requirements
- On failure → Worker retries with new arguments (up to
maxRetries) - On success → Result synthesized into final report
Worker → Execute → Failed? → Retry (maxRetries) → Soldier Verify → Success → Synthesize
↓
Fail again → Circuit Breaker
Circuit Breaker: Failure Protection
The Circuit Breaker prevents zombie swarms from consuming resources indefinitely:
// From antfarm.go:1127
if failures >= 5 {
// Trip breaker, pause for 2 minutes
cb.State = HalfOpen
time.Sleep(2 * time.Minute)
}
- Threshold: 5 consecutive failures
- Action: Auto-pause the swarm
- Cooldown: 2 minutes before retry
- Recovery: Manual intervention or automatic reset after cooldown
🛡️ Safety & Resource Management
- Rate Limiting: Swarms are rate-limited to prevent excessive LLM API usage or system resource exhaustion.
- Failure Resilience: If an individual Ant worker fails or times out, the
SwarmSessioncan re-queue the task for another worker. - Audit Trail: Every Ant worker's execution is logged with a unique execution ID for deep forensic analysis.
- Budget Enforcement: Each project has a max budget; work halts when exhausted.
🔄 Swarm Lifecycle
- Initialization: A high-level agent (e.g.,
SysAdmin) decides a task requires a swarm and callsspawn_antfarm_swarm. - Task Distribution: The
SwarmSessionbreaks the request into a queue of independent tasks. - Parallel Execution: A fleet of
ProjectWorkerSessionagents (up to a configurable limit) concurrently process tasks from the queue. - Result Synthesis: As ants complete tasks, their observations are bubbled up to the swarm coordinator.
- Final Report: Once the queue is empty, the coordinator synthesizes all observations into a final, high-level SITREP for the user.
🔧 Project Commands
| Command | Description |
|---|---|
pm_create_project |
Create a new project with budget and milestones |
pm_spawn_queen |
Spawn a queen to coordinate the swarm |
pm_spawn_ant |
Spawn individual ants with skill context |
pm_monitor_swarm |
Monitor active swarm status |
pm_kill_ant |
Terminate a specific ant |
🧬 Skills Integration
Workers can receive skill context on spawn:
- Skill ID passed to
NewProjectWorkerSession() - Skill manifest loaded from
skills.gocatalog (92+ skills) - System prompt suffix injected into worker's context
- Required tools enabled based on skill configuration
- Model tier adjusted based on skill requirements
See Assistant Core for the underlying session management. See Features/assistant/skills-dynamic-injection.md for skill system details.