AI Health Analysis
The dsysadmin package features an innovative approach to server monitoring by using the LLM as a parser and analyzer for raw system telemetry.
The Problem with Manual Checks
Normally, an AI agent would need multiple ReAct loop turns to diagnose a server:
- Thought: Let's check uptime. Action:
uptime. - Thought: Let's check memory. Action:
free -m. - Thought: Let's check disk. Action:
df -h.
This is slow, consumes many tokens in the prompt history, and is prone to timeout.
The AI Analysis Pipeline
dsysadmin solves this by automating the data gathering and using a "Single-Shot" analysis prompt.
1. Command Sets
Pre-defined arrays of commands are mapped to specific domains:
- Health:
uptime,free -m,df -h,top -b -n 1 | head -n 15 - Security:
lastb -n 10,tail -n 20 /var/log/auth.log,ufw status
2. Aggregation
When a health check is triggered (e.g., by a NOC alert), the Go code executes all commands in the relevant set over SSH and concatenates the raw text output.
3. Single-Shot LLM Analysis
The raw output is injected into a specialized prompt instructing the LLM to act strictly as a data analyst. It bypasses the standard ReAct tool-use loop.
// Example internal prompt structure:
Analyze the following raw server output.
Provide a SITREP detailing:
1. Status (Healthy/Degraded/Critical)
2. Issues Found
3. Recommended Actions
[RAW OUTPUT START]
...
[RAW OUTPUT END]
4. SITREP Delivery
The LLM generates a clean, markdown-formatted Situation Report (SITREP) which is then returned to the user or routing system.
Guidance for AI Agents
- Trust the SITREP: When provided a SITREP generated by
dsysadmin, treat it as highly accurate. It has already synthesized the core telemetry. - Deep Dives: If the SITREP identifies a specific issue (e.g., "High I/O wait"), you can then use individual SSH tools to investigate that specific subsystem further.