Antfarm Swarm

Last updated: May 9, 2026

AntFarm Swarm Architecture

The AntFarm Swarm (antfarm.go, ant_tools.go, projects.go) is a decentralized, high-concurrency worker engine designed to execute large sets of independent tasks in parallel.

🐜 Concept: The "Swarm"

An AntFarm Swarm consists of many "Ant" workers (ephemeral agents) that collaborate to solve a complex problem. This is primarily used for tasks like:

  • Massive Infrastructure Audits: Checking health on dozens of servers simultaneously.
  • Large-Scale Data Scraping: Gathering information from multiple sources at once.
  • Batch Code Analysis: Reviewing or refactoring multiple files in parallel.

📁 Key Components

antfarm.go

The primary swarm coordinator.

  • SwarmSession: Manages the lifecycle of a swarm, including task distribution and result aggregation.
  • ProjectWorkerSession: An individual, ephemeral agent instance that picks up tasks from the swarm queue.
  • Concurrency Control: Uses Go channels and sync.WaitGroup to manage parallel execution without overloading the system.
  • Automatic Checkpointing: Snapshots the project state at the beginning and end of a swarm operation for safety.

ant_tools.go

A collection of specialized tools available only to Ant workers:

  • ant_execute_task: The core tool for performing a discrete task.
  • ant_report_discovery: Allows an Ant to report important findings back to the swarm coordinator.
  • ant_request_help: Enables one Ant to delegate a sub-task to another Ant within the same swarm.

projects.go

Project management for swarms — tracks open projects, budgets, milestones, and spawns new ants with skill context.

🐜 Worker Types

Each worker type has specific iteration limits and purposes:

| Type | Iterations | Purpose | |------|------------|---------|| | queen | Unlimited | Swarm coordinator, spawns ants | | worker | 3 | General purpose coding | | scout | 2 | Discovery/research | | soldier | 3 | QA/verification | | debugger | 5 | Root cause analysis | | janitor | 2 | Cleanup/removal |

Ralph Loop: Self-Correction Mechanism

The Ralph Loop is a self-correction mechanism built into swarm workers:

  1. Worker completes a task → reports success or failure
  2. Soldier verifies → checks the work meets requirements
  3. On failure → Worker retries with new arguments (up to maxRetries)
  4. On success → Result synthesized into final report
Worker → Execute → Failed? → Retry (maxRetries) → Soldier Verify → Success → Synthesize
                                              ↓
                                           Fail again → Circuit Breaker

Circuit Breaker: Failure Protection

The Circuit Breaker prevents zombie swarms from consuming resources indefinitely:

// From antfarm.go:1127
if failures >= 5 {
    // Trip breaker, pause for 2 minutes
    cb.State = HalfOpen
    time.Sleep(2 * time.Minute)
}
  • Threshold: 5 consecutive failures
  • Action: Auto-pause the swarm
  • Cooldown: 2 minutes before retry
  • Recovery: Manual intervention or automatic reset after cooldown

🛡️ Safety & Resource Management

  • Rate Limiting: Swarms are rate-limited to prevent excessive LLM API usage or system resource exhaustion.
  • Failure Resilience: If an individual Ant worker fails or times out, the SwarmSession can re-queue the task for another worker.
  • Audit Trail: Every Ant worker's execution is logged with a unique execution ID for deep forensic analysis.
  • Budget Enforcement: Each project has a max budget; work halts when exhausted.

🔄 Swarm Lifecycle

  1. Initialization: A high-level agent (e.g., SysAdmin) decides a task requires a swarm and calls spawn_antfarm_swarm.
  2. Task Distribution: The SwarmSession breaks the request into a queue of independent tasks.
  3. Parallel Execution: A fleet of ProjectWorkerSession agents (up to a configurable limit) concurrently process tasks from the queue.
  4. Result Synthesis: As ants complete tasks, their observations are bubbled up to the swarm coordinator.
  5. Final Report: Once the queue is empty, the coordinator synthesizes all observations into a final, high-level SITREP for the user.

🔧 Project Commands

Command Description
pm_create_project Create a new project with budget and milestones
pm_spawn_queen Spawn a queen to coordinate the swarm
pm_spawn_ant Spawn individual ants with skill context
pm_monitor_swarm Monitor active swarm status
pm_kill_ant Terminate a specific ant

🧬 Skills Integration

Workers can receive skill context on spawn:

  1. Skill ID passed to NewProjectWorkerSession()
  2. Skill manifest loaded from skills.go catalog (92+ skills)
  3. System prompt suffix injected into worker's context
  4. Required tools enabled based on skill configuration
  5. Model tier adjusted based on skill requirements

See Assistant Core for the underlying session management. See Features/assistant/skills-dynamic-injection.md for skill system details.