Skip to content

AI Agents

The homelab uses two complementary AI agent systems: Cursor for interactive development and OpenClaw for autonomous multi-agent operations.

Architecture

flowchart TD
    subgraph source ["Single Source of Truth (git)"]
        CursorRules[".cursor/rules/*.mdc"]
        Skills["skills/*/SKILL.md"]
        AgentsMD["agents/workspaces/*/AGENTS.md"]
    end

    subgraph cursor ["Cursor IDE (local)"]
        CursorAgent["Cursor Agent\n(interactive, single-agent)"]
        CursorAgent -->|"alwaysApply + autoAttach"| CursorRules
        CursorAgent -->|"read on demand"| Skills
    end

    subgraph openclaw ["OpenClaw (K8s pod)"]
        HA["homelab-admin\n(orchestrator)"]
        CA["cursor-agent\n(senior lead)"]
        DS["devops-sre\n(junior)"]
        SE["software-engineer\n(junior)"]
        SA["security-analyst\n(junior)"]
        QA["qa-tester\n(junior)"]
        HA -->|"sessions_spawn"| CA
        HA -->|"sessions_spawn"| DS
        HA -->|"sessions_spawn"| SE
        HA -->|"sessions_spawn"| SA
        HA -->|"sessions_spawn"| QA
        CA -->|"sessions_spawn\n+ review"| DS
        CA -->|"sessions_spawn\n+ review"| SE
        CA -->|"sessions_spawn\n+ review"| SA
        CA -->|"sessions_spawn\n+ review"| QA
    end

    subgraph models ["Model Providers"]
        OpenRouterM["OpenRouter\nstepfun/step-3.5-flash:free\n(primary)"]
        GeminiM["Google Gemini\ngemini-2.5-pro\n(fallback)"]
    end

    openclaw --> OpenRouterM
    OpenRouterM -.->|"fallback"| GeminiM

    Skills -->|"hostPath /skills"| openclaw
    AgentsMD -->|"hostPath → init container"| openclaw

When to Use Which

Task System Why
Interactive coding, file editing, git ops Cursor Direct filesystem access, IDE integration, fast iteration
Planning, architecture review, Q&A Cursor (Ask/Plan mode) Context-aware with .cursor/rules/
Autonomous infra tasks (deploy, rotate secrets, incident response) OpenClaw Spawns sub-agents, runs kubectl, accessible from any device
Security audit, code review OpenClaw (security-analyst, software-engineer) Dedicated workspace, parallel execution
Quick kubectl check from phone/tablet OpenClaw Multi-device access via Tailscale

Cursor Rules

Cursor rules live in .cursor/rules/*.mdc with YAML frontmatter. They inject context into every Cursor conversation.

File Scope Contents
homelab.mdc alwaysApply: true Global homelab context, architecture layers, core rules (GitOps, no secrets in git, ArgoCD labeling)
kubernetes.mdc autoAttach: k8s/** Kustomize conventions, ArgoCD Application CR template, sync waves, namespace rules
terraform.mdc autoAttach: terraform/** Layer 0 bootstrap rules, variable naming, secret handling
openclaw.mdc autoAttach: k8s/apps/openclaw/**, skills/**, agents/** Agent/skill conventions, how to add agents and skills

Adding a Cursor Rule

  1. Create .cursor/rules/<name>.mdc with frontmatter:
---
description: Short description
alwaysApply: true          # injected into every conversation
# OR
autoAttach:
  - "path/glob/**"         # injected when matching files are open
---
  1. Write the rule content in markdown below the frontmatter.
  2. Commit and push. The rule takes effect immediately in new Cursor sessions.

OpenClaw Agents

OpenClaw runs nine agents in total: six core agents in a two-tier orchestrator pattern, and three specialist agents for domain-specific tasks. The homelab-admin agent is the only directly accessible agent — it receives all user requests and delegates to sub-agents via sessions_spawn. The cursor-agent serves as a senior lead that can both execute complex tasks and direct junior agents.

Agent Tier Role Workspace
homelab-admin Orchestrator Default entry point, delegates to all agents /data/workspaces/homelab-admin
cursor-agent Senior lead AI-assisted code gen via Cursor CLI, PR review authority, technical direction /data/workspaces/cursor-agent
devops-sre Junior Infrastructure, K8s, Terraform /data/workspaces/devops-sre
software-engineer Junior Code development, testing /data/workspaces/software-engineer
security-analyst Junior Security audits, hardening /data/workspaces/security-analyst
qa-tester Junior Deployment validation, service health testing, regression checks /data/workspaces/qa-tester
deutsch-tutor Specialist AI German language tutor — spaced repetition, grammar, conversation /data/workspaces/deutsch-tutor
english-tutor Specialist AI IELTS 8.0 coach — grammar precision, academic vocabulary, essay feedback, speaking simulation /data/workspaces/english-tutor
daily-routine Specialist Proactive health & schedule coach — morning briefing, meal reminders, training sessions, hydration, wind-down /data/workspaces/daily-routine

Every agent has an explicit object-form model with { primary, fallbacks } in the configmap:

Setting Value
Primary openrouter/stepfun/step-3.5-flash:free
Fallback google/gemini-2.5-pro

When the primary fails or hits rate limits, OpenClaw falls through to Gemini. Auth is via OPENROUTER_API_KEY and GEMINI_API_KEY env vars (synced from Infisical).

Convention: Per-agent model must always use the object form { "primary": "...", "fallbacks": ["..."] }. A plain string discards fallbacks. See the OpenClaw README for details.

How the Orchestrator Works

Users interact only with homelab-admin in the OpenClaw Control UI. It is the sole agent with "default": true in the config. The other five agents are sub-agents — they don't appear in the UI dropdown and can only be spawned by the orchestrator.

Delegation rules:

Request type Delegated to Example
Complex code generation, multi-file refactors cursor-agent (senior) "Generate a Python script that validates YAML files"
Multi-agent coordinated tasks cursor-agent (senior, decomposes + directs) "Refactor the monitoring stack and update all docs"
PR review for junior agent output cursor-agent (senior) "Review the devops-sre PR for the new NetworkPolicy"
Infrastructure changes, Terraform, K8s ops, monitoring devops-sre (junior) "Add a NodePort to the monitoring stack"
Code changes, feature development, testing software-engineer (junior) "Update the Dockerfile to add a new tool"
Security audits, vulnerability assessment, hardening security-analyst (junior) "Audit the RBAC configuration"
Deployment validation, regression testing, health checks qa-tester (junior) "Verify all services are healthy after the merge"
Read-only checks, status queries, simple answers homelab-admin (handles directly) "What pods are running?"

When delegating, homelab-admin uses sessions_spawn and provides: 1. Clear task description and expected outcome 2. Relevant file paths or service names 3. Any context from prior conversation 4. Label instructions (agent, type, area, priority)

The spawned sub-agent session appears in the UI sidebar. Sub-agents report results back via sessions_announce.

Cursor Agent — Senior Lead & Handoff Protocol

The cursor-agent is the senior lead in the agent hierarchy. It has three responsibilities: (1) AI-assisted code generation via the Cursor CLI, (2) PR review authority for junior agent output, and (3) technical direction on multi-agent tasks. It can spawn and direct junior agents (devops-sre, software-engineer, security-analyst, qa-tester) independently.

sequenceDiagram
    participant HA as homelab-admin<br/>(Orchestrator)
    participant CA as cursor-agent<br/>(OpenClaw sub-agent)
    participant CLI as Cursor CLI<br/>(agent command)
    participant GH as GitHub

    HA->>CA: sessions_spawn with task context<br/>(description, repo, files, constraints)
    CA->>CA: Clone/update repo, create branch,<br/>set git identity
    CA->>CLI: agent -p 'prompt' --force<br/>(or tmux session for complex tasks)
    CLI-->>CA: Generated code changes
    CA->>CA: Review diff: no secrets,<br/>style check, test coverage
    alt Quality OK
        CA->>GH: git commit + push + gh pr create
        GH-->>CA: PR URL
        CA-->>HA: sessions_announce: PR URL + summary
    else Quality concerns
        CA-->>HA: Report issues, request guidance
    end

Execution modes (selected by task complexity):

Complexity Mode Command
Simple, single-file Non-interactive agent -p '<prompt>' --force
Multi-file, needs context Non-interactive with @file refs agent -p '<prompt>' --force
Iterative, needs refinement tmux automation tmux session with interactive agent

Prerequisites (included in the Docker image):

  • Cursor CLI (agent command) — installed via install script in Dockerfile.openclaw
  • tmux — installed for complex task execution
  • CURSOR_API_KEY — synced from Infisical via ESO into openclaw-secret

See the cursor-agent skill (skills/cursor-agent/SKILL.md) for the full CLI reference, tmux automation patterns, and troubleshooting guide.

Mandatory Git Workflow

ALL agents enforce a mandatory git workflow for any change to the homelab repository. No agent — including the orchestrator — pushes directly to main. Branch protection is enforced on main: PRs require at least one approving review, force pushes are blocked, and linear history is required.

sequenceDiagram
    participant User
    participant HA as homelab-admin<br/>(Orchestrator)
    participant SA as Sub-agent<br/>(devops-sre / software-engineer /<br/>security-analyst / qa-tester / cursor-agent)
    participant GH as GitHub
    participant Argo as ArgoCD

    User->>HA: "Add resource X to service Y"
    HA->>HA: Analyze request, pick agent
    HA->>SA: sessions_spawn with task context

    SA->>SA: Clone repo + set git identity
    SA->>GH: gh issue create (with agent footer)
    GH-->>SA: Issue #42
    SA->>SA: git checkout -b agent-id/feat/42-add-resource-x
    SA->>SA: Edit manifests + docs
    SA->>SA: git commit (with agent tag in message)
    SA->>GH: git push + gh pr create (with agent footer)
    GH-->>SA: PR URL

    SA-->>HA: Report PR URL
    HA-->>User: PR created, ArgoCD syncs in ~3min

    User->>GH: Review & merge PR
    GH->>Argo: Push to main triggers sync
    Argo->>Argo: Auto-sync within ~3 minutes

Step-by-step process (every agent follows this):

  1. Clone the repo into the agent's workspace and set git identity (git config user.name "<agent-id>[bot]", git config user.email "<agent-id>@openclaw.homelab")
  2. Create a labeled GitHub issue via gh issue create with --assignee holdennguyen --label "agent:<id>,type:<type>,area:<area>,priority:<priority>" --milestone "<current-milestone>" — body ends with Agent: <agent-id> | OpenClaw Homelab footer. If no open milestone exists, ask the orchestrator (or user) to create one.
  3. Create a branch from main: <agent-id>/<type>/<issue-number>-<short-description> (prefixes: feat/, fix/, chore/, docs/, refactor/)
  4. Make changes to manifests, config, docs
  5. Commit with a message referencing the issue and agent: <type>: <description> (#<issue-number>) [<agent-id>]
  6. Keep branch up to date — before every push, run git fetch origin main && git merge origin/main --no-edit to incorporate any changes merged to main since the branch was created. Resolve conflicts if needed; never force-push.
  7. Push and create a labeled PR via gh pr create with the same labels and --milestone "<current-milestone>" — body ends with Agent: <agent-id> | OpenClaw Homelab footer
  8. Report the PR URL back to the orchestrator or user

GitHub Labels

Every issue and PR created by agents MUST be labeled. Labels serve as the tracking and filtering mechanism since agents are not GitHub users.

Category Labels Rule
Agent agent:homelab-admin, agent:devops-sre, agent:software-engineer, agent:security-analyst, agent:qa-tester, agent:cursor-agent Exactly one — who is working on this
Type type:feat, type:fix, type:chore, type:docs, type:refactor, type:security Exactly one — what kind of change
Area area:k8s, area:terraform, area:argocd, area:secrets, area:monitoring, area:networking, area:openclaw, area:auth One or more — what part of the homelab
Priority priority:critical, priority:high, priority:medium, priority:low Exactly one — urgency
Semver semver:breaking Only when a change has breaking impact regardless of type (most PRs don't need this)

All issues and PRs are assigned to holdennguyen (repo owner) since agents are not GitHub collaborators. The agent:* label identifies which agent is responsible. Every issue and PR MUST also be assigned to a milestone (see Semantic Versioning & Releases).

Agent Footprint

Every agent action is traceable via a mandatory footprint convention. This ensures that any commit, issue, PR, or branch can be attributed to the specific agent that performed it.

Artifact Footprint Example
Git commit author <agent-id>[bot] <<agent-id>@openclaw.homelab> devops-sre[bot] <devops-sre@openclaw.homelab>
Commit message ... [<agent-id>] suffix feat: add redis (#42) [devops-sre]
Branch name <agent-id>/... prefix devops-sre/feat/42-redis-caching
Issue labels agent:<agent-id> agent:devops-sre
Issue body Footer: Agent: <agent-id> \| OpenClaw Homelab
PR labels agent:<agent-id> agent:devops-sre
PR body Footer: Agent: <agent-id> \| OpenClaw Homelab

Each agent sets its git identity during workspace setup using git config user.name and git config user.email in the cloned repo. There is no shared global git identity — every commit is attributable to the exact agent that authored it.

After the PR is merged:

  • Layer 1 changes (k8s manifests): ArgoCD auto-syncs within ~3 minutes
  • Layer 0 changes (Terraform): Requires manual terraform apply on the host
  • Docker image changes: Requires ./scripts/build-openclaw.sh + pod restart on the host

What enables this:

  • git and gh CLI are baked into the container image (Dockerfile.openclaw)
  • GITHUB_TOKEN from Infisical provides authentication for gh CLI
  • Per-agent git identity is set via git config in each cloned repo (no shared global identity)

Semantic Versioning & Releases

The homelab repository follows Semantic Versioning 2.0.0 (vMAJOR.MINOR.PATCH). Releases group work that has been merged to main into versioned, tagged milestones.

Version bump rules — the highest-impact change in a milestone determines the version:

Condition Bump Example
Any PR has semver:breaking MAJOR Terraform state migration, removed service
At least one type:feat (no breaking) MINOR New service, new agent, new capability
Only fixes, chores, docs, refactors, security PATCH Bug fix, dependency update, doc improvement

Milestones group issues and PRs into planned releases:

  • Named with the target version (e.g., v0.3.0)
  • Every issue and PR MUST be assigned to a milestone
  • homelab-admin creates milestones and adjusts the version if breaking changes are introduced
  • Sub-agents check for the current open milestone before creating issues

Release process (owned by homelab-admin or the user — sub-agents never create tags or releases):

  1. Verify all issues in the milestone are closed
  2. Determine the version from the highest-impact PR
  3. Create a git tag and GitHub Release with auto-generated notes (gh release create)
  4. Close the milestone and create the next one

GitHub auto-generates release notes from merged PRs, using the existing type and area labels for grouping.

Incident Response & Rollback

Agents follow a structured incident response procedure when deployments cause service degradation. The procedure is codified in the incident-response skill and enforced across agent personalities.

Roles during an incident:

Role Agent Responsibilities
Incident commander homelab-admin Declare severity, decide rollback vs forward-fix, coordinate response, ensure post-incident report
Rollback executor devops-sre Triage cluster state, execute git reverts, recover stuck ArgoCD syncs, verify recovery
Validation qa-tester Pre-merge validation (before PRs), post-rollback verification (after reverts)

Pre-merge validation (prevents incidents):

All agents that modify cluster resources must verify their changes before submitting PRs:

  • Helm value verification — always confirm key paths exist via helm show values before modifying valuesObject. Charts silently ignore unknown keys.
  • SecurityContext compatibility — verify container images support non-root execution. Some images use init systems (e.g., s6-overlay) that require root at startup.
  • Cross-service impact — ensure changes don't break sync wave dependencies or shared resources.

Rollback procedure:

The standard rollback is a git revert pushed to main. ArgoCD auto-syncs the revert within ~3 minutes.

git revert <bad-commit-sha> -m 1 --no-edit
git push origin main

For multi-commit rollbacks, restore files to a known-good commit. For stuck ArgoCD syncs, cancel operations and force hard refresh. Full procedures are documented in the incident-response skill.

Post-incident documentation is mandatory for SEV-1 through SEV-3 incidents. Reports include timeline, root cause, blast radius, resolution, lessons learned, and action items.

Agent Configuration

Agent config lives in two places:

  • Identity: k8s/apps/openclaw/configmap.yamlopenclaw.jsonagents.list (id, name, model, workspace, skills allowlist, subagents.allowAgents)
  • Personality: agents/workspaces/<id>/AGENTS.md (single source of truth, copied into pod on every restart)

Key gateway-level settings in openclaw.json:

Setting Value Purpose
gateway.mode "local" Enables full gateway functionality
gateway.trustedProxies RFC 1918 ranges Treats K8s internal traffic as local
tools.sessions.visibility "all" Orchestrator can view sub-agent session history
tools.agentToAgent.enabled true Enables cross-agent communication

The container image (Dockerfile.openclaw) includes ops tools (kubectl, helm, terraform, argocd, jq, git, gh). The pod's ServiceAccount has a namespace-scoped Role in openclaw with read-only access to pods, logs, secrets, configmaps, services, PVCs, and exec into pods — it does NOT have cluster-wide access (the previous cluster-admin binding was removed in v1.1.0).

Per-Agent Skill Assignment

Each agent has a skills allowlist in the configmap that restricts which skills it can see. Omitting the field means all skills; an empty array means none.

Agent Tier Assigned Skills
homelab-admin Orchestrator homelab-admin, gitops, secret-management, incident-response, weather, deutsch-tutor, english-tutor
cursor-agent Senior lead cursor-agent, gitops, software-engineer, security-analyst, qa-tester
devops-sre Junior devops-sre, gitops, secret-management, incident-response
software-engineer Junior software-engineer, gitops
security-analyst Junior security-analyst, gitops, secret-management
qa-tester Junior qa-tester, gitops, secret-management, incident-response

All agents get the gitops skill for the mandatory git workflow. Cross-cutting skills (e.g. secret-management) are shared across agents that need them.

Adding a New Agent

  1. Add the agent entry to k8s/apps/openclaw/configmap.yaml under agents.list — include a skills array and a subagents.allowAgents list
  2. Add the new agent ID to the orchestrator's subagents.allowAgents so it can be spawned
  3. Create agents/workspaces/<id>/AGENTS.md with a lean agent personality (identity, tone, role-specific guidance, rules referencing skills for procedural content)
  4. Add the agent ID to the init container's for loop in k8s/apps/openclaw/deployment.yaml
  5. Add the agent ID to tools.agentToAgent.allow in the configmap
  6. Push to main via PR (branch protection requires review) and restart: kubectl rollout restart deployment/openclaw -n openclaw

Sub-agent Spawning

The orchestrator uses maxSpawnDepth: 2:

  • Depth 0: homelab-admin receives user requests
  • Depth 1: Orchestrator spawns specialized sub-agents
  • Depth 2: Sub-agents can spawn leaf workers for parallel tasks

Limits (configured in configmap.yaml):

  • maxConcurrent: 4 — max parallel sub-agents
  • maxChildrenPerAgent: 3 — max children per agent session
  • archiveAfterMinutes: 120 — auto-cleanup of finished sessions

OpenClaw Skills

Skills provide domain-specific knowledge and commands to agents. They live in skills/ at the repo root and are mounted into the pod via hostPath at /skills.

Skill Description
homelab-admin Cluster operations, service management, GitOps workflow, delegation framework
devops-sre Infrastructure debugging, Terraform, incident response, SLOs, deployment strategies
software-engineer Code development, review, testing, K8s manifest conventions, dependency management
security-analyst Security audits, RBAC review, vulnerability assessment, STRIDE threat modeling, CIS hardening
qa-tester Deployment validation, service health testing, regression checks, per-service acceptance criteria (ArgoCD, ESO, Infisical, OpenClaw, Monitoring, Authentik)
gitops ArgoCD App of Apps pattern, sync management, mandatory git workflow reference
incident-response Incident triage, rollback procedures, pre-merge validation, post-incident documentation
secret-management Infisical → ESO → K8s pipeline operations
cursor-agent Cursor CLI bridge: installation, automation, handoff protocol, code generation workflows
deutsch-tutor AI German language tutor — spaced repetition (FSRS), flashcard decks (A1/A2/B1), grammar lessons, conversation practice, Vietnamese explanations
english-tutor AI IELTS 8.0 coach — spaced repetition (FSRS), advanced grammar, academic vocabulary, collocations, writing workshop, speaking simulation, reading/listening strategies
weather Real-time weather via Open-Meteo and wttr.in (no API key required)

Skill Format

skills/<name>/SKILL.md

With YAML frontmatter:

---
name: <skill-name>
description: <one-line description for the agent>
metadata:
  openclaw:
    emoji: "<emoji>"
    requires: { anyBins: ["kubectl"] }  # optional binary requirements
---

Adding a New Skill

  1. Create skills/<name>/SKILL.md with the frontmatter above
  2. Write operational knowledge in markdown: commands, troubleshooting tables, workflows
  3. Add the skill name to the skills array of each agent that should use it in the configmap
  4. Push to main and restart: kubectl rollout restart deployment/openclaw -n openclaw

Skills auto-load via skills.load.extraDirs: ["/skills"] in the OpenClaw config.

Documentation Maintenance

Both Cursor and OpenClaw agents are expected to keep documentation up-to-date as part of every implementation. The repo includes a documentation freshness tracking system to make this verifiable.

For agents

Before creating a PR, run the freshness checker to see if your changes require doc updates:

python scripts/doc-freshness.py --check-pr     # Which docs this branch should update
python scripts/doc-freshness.py --stale        # Full staleness report across all docs

The tool reads .doc-manifest.yml (which maps every doc to its implementation sources) and compares git commit timestamps. If sources are newer than their mapped doc, the doc is flagged as stale with the number of commits behind.

The doc-freshness CI workflow runs automatically on every PR to main and will post a warning comment if mapped docs are missing updates. The check is advisory — it does not block merge.

What to update

The .cursor/rules/homelab.mdc file contains the full "What to update" table mapping change areas to docs. The key rule: every service directory under k8s/apps/ must have a README.md that serves as the single source of truth. The corresponding docs/<service>.md is a thin MkDocs wrapper that includes the README.

When adding a new service

Add an entry to .doc-manifest.yml mapping the new service's README to its source directory. This ensures the freshness system tracks it going forward. See the full checklist in .cursor/rules/homelab.mdc under "Adding a new service."

Single Source of Truth

Content Source Consumed By
Cursor context rules .cursor/rules/*.mdc Cursor IDE
Agent personalities agents/workspaces/*/AGENTS.md OpenClaw (copied into pod workspace by init container)
Operational skills skills/*/SKILL.md OpenClaw (mounted at /skills), Cursor (read on demand)
Agent roster & config k8s/apps/openclaw/configmap.yaml OpenClaw (mounted at /config)
Doc-to-source mappings .doc-manifest.yml scripts/doc-freshness.py, doc-freshness CI workflow

Agent personalities are lean (identity, tone, role-specific guidance) and reference skills for procedural content (git workflow, documentation matrix, labels). This avoids duplication and keeps each piece of content in exactly one source file.