AI Agents
The homelab uses two complementary AI agent systems: Cursor for interactive development and OpenClaw for autonomous multi-agent operations.
Architecture
flowchart TD
subgraph source ["Single Source of Truth (git)"]
CursorRules[".cursor/rules/*.mdc"]
Skills["skills/*/SKILL.md"]
AgentsMD["agents/workspaces/*/AGENTS.md"]
end
subgraph cursor ["Cursor IDE (local)"]
CursorAgent["Cursor Agent\n(interactive, single-agent)"]
CursorAgent -->|"alwaysApply + autoAttach"| CursorRules
CursorAgent -->|"read on demand"| Skills
end
subgraph openclaw ["OpenClaw (K8s pod)"]
HA["homelab-admin\n(orchestrator)"]
CA["cursor-agent\n(senior lead)"]
DS["devops-sre\n(junior)"]
SE["software-engineer\n(junior)"]
SA["security-analyst\n(junior)"]
QA["qa-tester\n(junior)"]
HA -->|"sessions_spawn"| CA
HA -->|"sessions_spawn"| DS
HA -->|"sessions_spawn"| SE
HA -->|"sessions_spawn"| SA
HA -->|"sessions_spawn"| QA
CA -->|"sessions_spawn\n+ review"| DS
CA -->|"sessions_spawn\n+ review"| SE
CA -->|"sessions_spawn\n+ review"| SA
CA -->|"sessions_spawn\n+ review"| QA
end
subgraph models ["Model Providers"]
OpenRouterM["OpenRouter\nstepfun/step-3.5-flash:free\n(primary)"]
GeminiM["Google Gemini\ngemini-2.5-pro\n(fallback)"]
end
openclaw --> OpenRouterM
OpenRouterM -.->|"fallback"| GeminiM
Skills -->|"hostPath /skills"| openclaw
AgentsMD -->|"hostPath → init container"| openclaw
When to Use Which
| Task | System | Why |
|---|---|---|
| Interactive coding, file editing, git ops | Cursor | Direct filesystem access, IDE integration, fast iteration |
| Planning, architecture review, Q&A | Cursor (Ask/Plan mode) | Context-aware with .cursor/rules/ |
| Autonomous infra tasks (deploy, rotate secrets, incident response) | OpenClaw | Spawns sub-agents, runs kubectl, accessible from any device |
| Security audit, code review | OpenClaw (security-analyst, software-engineer) | Dedicated workspace, parallel execution |
| Quick kubectl check from phone/tablet | OpenClaw | Multi-device access via Tailscale |
Cursor Rules
Cursor rules live in .cursor/rules/*.mdc with YAML frontmatter. They inject context into every Cursor conversation.
| File | Scope | Contents |
|---|---|---|
homelab.mdc |
alwaysApply: true |
Global homelab context, architecture layers, core rules (GitOps, no secrets in git, ArgoCD labeling) |
kubernetes.mdc |
autoAttach: k8s/** |
Kustomize conventions, ArgoCD Application CR template, sync waves, namespace rules |
terraform.mdc |
autoAttach: terraform/** |
Layer 0 bootstrap rules, variable naming, secret handling |
openclaw.mdc |
autoAttach: k8s/apps/openclaw/**, skills/**, agents/** |
Agent/skill conventions, how to add agents and skills |
Adding a Cursor Rule
- Create
.cursor/rules/<name>.mdcwith frontmatter:
---
description: Short description
alwaysApply: true # injected into every conversation
# OR
autoAttach:
- "path/glob/**" # injected when matching files are open
---
- Write the rule content in markdown below the frontmatter.
- Commit and push. The rule takes effect immediately in new Cursor sessions.
OpenClaw Agents
OpenClaw runs nine agents in total: six core agents in a two-tier orchestrator pattern, and three specialist agents for domain-specific tasks. The homelab-admin agent is the only directly accessible agent — it receives all user requests and delegates to sub-agents via sessions_spawn. The cursor-agent serves as a senior lead that can both execute complex tasks and direct junior agents.
| Agent | Tier | Role | Workspace |
|---|---|---|---|
homelab-admin |
Orchestrator | Default entry point, delegates to all agents | /data/workspaces/homelab-admin |
cursor-agent |
Senior lead | AI-assisted code gen via Cursor CLI, PR review authority, technical direction | /data/workspaces/cursor-agent |
devops-sre |
Junior | Infrastructure, K8s, Terraform | /data/workspaces/devops-sre |
software-engineer |
Junior | Code development, testing | /data/workspaces/software-engineer |
security-analyst |
Junior | Security audits, hardening | /data/workspaces/security-analyst |
qa-tester |
Junior | Deployment validation, service health testing, regression checks | /data/workspaces/qa-tester |
deutsch-tutor |
Specialist | AI German language tutor — spaced repetition, grammar, conversation | /data/workspaces/deutsch-tutor |
english-tutor |
Specialist | AI IELTS 8.0 coach — grammar precision, academic vocabulary, essay feedback, speaking simulation | /data/workspaces/english-tutor |
daily-routine |
Specialist | Proactive health & schedule coach — morning briefing, meal reminders, training sessions, hydration, wind-down | /data/workspaces/daily-routine |
Every agent has an explicit object-form model with { primary, fallbacks } in the configmap:
| Setting | Value |
|---|---|
| Primary | openrouter/stepfun/step-3.5-flash:free |
| Fallback | google/gemini-2.5-pro |
When the primary fails or hits rate limits, OpenClaw falls through to Gemini. Auth is via OPENROUTER_API_KEY and GEMINI_API_KEY env vars (synced from Infisical).
Convention: Per-agent model must always use the object form { "primary": "...", "fallbacks": ["..."] }. A plain string discards fallbacks. See the OpenClaw README for details.
How the Orchestrator Works
Users interact only with homelab-admin in the OpenClaw Control UI. It is the sole agent with "default": true in the config. The other five agents are sub-agents — they don't appear in the UI dropdown and can only be spawned by the orchestrator.
Delegation rules:
| Request type | Delegated to | Example |
|---|---|---|
| Complex code generation, multi-file refactors | cursor-agent (senior) |
"Generate a Python script that validates YAML files" |
| Multi-agent coordinated tasks | cursor-agent (senior, decomposes + directs) |
"Refactor the monitoring stack and update all docs" |
| PR review for junior agent output | cursor-agent (senior) |
"Review the devops-sre PR for the new NetworkPolicy" |
| Infrastructure changes, Terraform, K8s ops, monitoring | devops-sre (junior) |
"Add a NodePort to the monitoring stack" |
| Code changes, feature development, testing | software-engineer (junior) |
"Update the Dockerfile to add a new tool" |
| Security audits, vulnerability assessment, hardening | security-analyst (junior) |
"Audit the RBAC configuration" |
| Deployment validation, regression testing, health checks | qa-tester (junior) |
"Verify all services are healthy after the merge" |
| Read-only checks, status queries, simple answers | homelab-admin (handles directly) |
"What pods are running?" |
When delegating, homelab-admin uses sessions_spawn and provides:
1. Clear task description and expected outcome
2. Relevant file paths or service names
3. Any context from prior conversation
4. Label instructions (agent, type, area, priority)
The spawned sub-agent session appears in the UI sidebar. Sub-agents report results back via sessions_announce.
Cursor Agent — Senior Lead & Handoff Protocol
The cursor-agent is the senior lead in the agent hierarchy. It has three responsibilities: (1) AI-assisted code generation via the Cursor CLI, (2) PR review authority for junior agent output, and (3) technical direction on multi-agent tasks. It can spawn and direct junior agents (devops-sre, software-engineer, security-analyst, qa-tester) independently.
sequenceDiagram
participant HA as homelab-admin<br/>(Orchestrator)
participant CA as cursor-agent<br/>(OpenClaw sub-agent)
participant CLI as Cursor CLI<br/>(agent command)
participant GH as GitHub
HA->>CA: sessions_spawn with task context<br/>(description, repo, files, constraints)
CA->>CA: Clone/update repo, create branch,<br/>set git identity
CA->>CLI: agent -p 'prompt' --force<br/>(or tmux session for complex tasks)
CLI-->>CA: Generated code changes
CA->>CA: Review diff: no secrets,<br/>style check, test coverage
alt Quality OK
CA->>GH: git commit + push + gh pr create
GH-->>CA: PR URL
CA-->>HA: sessions_announce: PR URL + summary
else Quality concerns
CA-->>HA: Report issues, request guidance
end
Execution modes (selected by task complexity):
| Complexity | Mode | Command |
|---|---|---|
| Simple, single-file | Non-interactive | agent -p '<prompt>' --force |
| Multi-file, needs context | Non-interactive with @file refs |
agent -p '<prompt>' --force |
| Iterative, needs refinement | tmux automation | tmux session with interactive agent |
Prerequisites (included in the Docker image):
- Cursor CLI (
agentcommand) — installed via install script inDockerfile.openclaw tmux— installed for complex task executionCURSOR_API_KEY— synced from Infisical via ESO intoopenclaw-secret
See the cursor-agent skill (skills/cursor-agent/SKILL.md) for the full CLI reference, tmux automation patterns, and troubleshooting guide.
Mandatory Git Workflow
ALL agents enforce a mandatory git workflow for any change to the homelab repository. No agent — including the orchestrator — pushes directly to main. Branch protection is enforced on main: PRs require at least one approving review, force pushes are blocked, and linear history is required.
sequenceDiagram
participant User
participant HA as homelab-admin<br/>(Orchestrator)
participant SA as Sub-agent<br/>(devops-sre / software-engineer /<br/>security-analyst / qa-tester / cursor-agent)
participant GH as GitHub
participant Argo as ArgoCD
User->>HA: "Add resource X to service Y"
HA->>HA: Analyze request, pick agent
HA->>SA: sessions_spawn with task context
SA->>SA: Clone repo + set git identity
SA->>GH: gh issue create (with agent footer)
GH-->>SA: Issue #42
SA->>SA: git checkout -b agent-id/feat/42-add-resource-x
SA->>SA: Edit manifests + docs
SA->>SA: git commit (with agent tag in message)
SA->>GH: git push + gh pr create (with agent footer)
GH-->>SA: PR URL
SA-->>HA: Report PR URL
HA-->>User: PR created, ArgoCD syncs in ~3min
User->>GH: Review & merge PR
GH->>Argo: Push to main triggers sync
Argo->>Argo: Auto-sync within ~3 minutes
Step-by-step process (every agent follows this):
- Clone the repo into the agent's workspace and set git identity (
git config user.name "<agent-id>[bot]",git config user.email "<agent-id>@openclaw.homelab") - Create a labeled GitHub issue via
gh issue createwith--assignee holdennguyen --label "agent:<id>,type:<type>,area:<area>,priority:<priority>" --milestone "<current-milestone>"— body ends withAgent: <agent-id> | OpenClaw Homelabfooter. If no open milestone exists, ask the orchestrator (or user) to create one. - Create a branch from main:
<agent-id>/<type>/<issue-number>-<short-description>(prefixes:feat/,fix/,chore/,docs/,refactor/) - Make changes to manifests, config, docs
- Commit with a message referencing the issue and agent:
<type>: <description> (#<issue-number>) [<agent-id>] - Keep branch up to date — before every push, run
git fetch origin main && git merge origin/main --no-editto incorporate any changes merged tomainsince the branch was created. Resolve conflicts if needed; never force-push. - Push and create a labeled PR via
gh pr createwith the same labels and--milestone "<current-milestone>"— body ends withAgent: <agent-id> | OpenClaw Homelabfooter - Report the PR URL back to the orchestrator or user
GitHub Labels
Every issue and PR created by agents MUST be labeled. Labels serve as the tracking and filtering mechanism since agents are not GitHub users.
| Category | Labels | Rule |
|---|---|---|
| Agent | agent:homelab-admin, agent:devops-sre, agent:software-engineer, agent:security-analyst, agent:qa-tester, agent:cursor-agent |
Exactly one — who is working on this |
| Type | type:feat, type:fix, type:chore, type:docs, type:refactor, type:security |
Exactly one — what kind of change |
| Area | area:k8s, area:terraform, area:argocd, area:secrets, area:monitoring, area:networking, area:openclaw, area:auth |
One or more — what part of the homelab |
| Priority | priority:critical, priority:high, priority:medium, priority:low |
Exactly one — urgency |
| Semver | semver:breaking |
Only when a change has breaking impact regardless of type (most PRs don't need this) |
All issues and PRs are assigned to holdennguyen (repo owner) since agents are not GitHub collaborators. The agent:* label identifies which agent is responsible. Every issue and PR MUST also be assigned to a milestone (see Semantic Versioning & Releases).
Agent Footprint
Every agent action is traceable via a mandatory footprint convention. This ensures that any commit, issue, PR, or branch can be attributed to the specific agent that performed it.
| Artifact | Footprint | Example |
|---|---|---|
| Git commit author | <agent-id>[bot] <<agent-id>@openclaw.homelab> |
devops-sre[bot] <devops-sre@openclaw.homelab> |
| Commit message | ... [<agent-id>] suffix |
feat: add redis (#42) [devops-sre] |
| Branch name | <agent-id>/... prefix |
devops-sre/feat/42-redis-caching |
| Issue labels | agent:<agent-id> |
agent:devops-sre |
| Issue body | Footer: Agent: <agent-id> \| OpenClaw Homelab |
— |
| PR labels | agent:<agent-id> |
agent:devops-sre |
| PR body | Footer: Agent: <agent-id> \| OpenClaw Homelab |
— |
Each agent sets its git identity during workspace setup using git config user.name and git config user.email in the cloned repo. There is no shared global git identity — every commit is attributable to the exact agent that authored it.
After the PR is merged:
- Layer 1 changes (k8s manifests): ArgoCD auto-syncs within ~3 minutes
- Layer 0 changes (Terraform): Requires manual
terraform applyon the host - Docker image changes: Requires
./scripts/build-openclaw.sh+ pod restart on the host
What enables this:
gitandghCLI are baked into the container image (Dockerfile.openclaw)GITHUB_TOKENfrom Infisical provides authentication forghCLI- Per-agent git identity is set via
git configin each cloned repo (no shared global identity)
Semantic Versioning & Releases
The homelab repository follows Semantic Versioning 2.0.0 (vMAJOR.MINOR.PATCH). Releases group work that has been merged to main into versioned, tagged milestones.
Version bump rules — the highest-impact change in a milestone determines the version:
| Condition | Bump | Example |
|---|---|---|
Any PR has semver:breaking |
MAJOR | Terraform state migration, removed service |
At least one type:feat (no breaking) |
MINOR | New service, new agent, new capability |
| Only fixes, chores, docs, refactors, security | PATCH | Bug fix, dependency update, doc improvement |
Milestones group issues and PRs into planned releases:
- Named with the target version (e.g.,
v0.3.0) - Every issue and PR MUST be assigned to a milestone
homelab-admincreates milestones and adjusts the version if breaking changes are introduced- Sub-agents check for the current open milestone before creating issues
Release process (owned by homelab-admin or the user — sub-agents never create tags or releases):
- Verify all issues in the milestone are closed
- Determine the version from the highest-impact PR
- Create a git tag and GitHub Release with auto-generated notes (
gh release create) - Close the milestone and create the next one
GitHub auto-generates release notes from merged PRs, using the existing type and area labels for grouping.
Incident Response & Rollback
Agents follow a structured incident response procedure when deployments cause service degradation. The procedure is codified in the incident-response skill and enforced across agent personalities.
Roles during an incident:
| Role | Agent | Responsibilities |
|---|---|---|
| Incident commander | homelab-admin |
Declare severity, decide rollback vs forward-fix, coordinate response, ensure post-incident report |
| Rollback executor | devops-sre |
Triage cluster state, execute git reverts, recover stuck ArgoCD syncs, verify recovery |
| Validation | qa-tester |
Pre-merge validation (before PRs), post-rollback verification (after reverts) |
Pre-merge validation (prevents incidents):
All agents that modify cluster resources must verify their changes before submitting PRs:
- Helm value verification — always confirm key paths exist via
helm show valuesbefore modifyingvaluesObject. Charts silently ignore unknown keys. - SecurityContext compatibility — verify container images support non-root execution. Some images use init systems (e.g., s6-overlay) that require root at startup.
- Cross-service impact — ensure changes don't break sync wave dependencies or shared resources.
Rollback procedure:
The standard rollback is a git revert pushed to main. ArgoCD auto-syncs the revert within ~3 minutes.
git revert <bad-commit-sha> -m 1 --no-edit
git push origin main
For multi-commit rollbacks, restore files to a known-good commit. For stuck ArgoCD syncs, cancel operations and force hard refresh. Full procedures are documented in the incident-response skill.
Post-incident documentation is mandatory for SEV-1 through SEV-3 incidents. Reports include timeline, root cause, blast radius, resolution, lessons learned, and action items.
Agent Configuration
Agent config lives in two places:
- Identity:
k8s/apps/openclaw/configmap.yaml→openclaw.json→agents.list(id, name, model, workspace, skills allowlist,subagents.allowAgents) - Personality:
agents/workspaces/<id>/AGENTS.md(single source of truth, copied into pod on every restart)
Key gateway-level settings in openclaw.json:
| Setting | Value | Purpose |
|---|---|---|
gateway.mode |
"local" |
Enables full gateway functionality |
gateway.trustedProxies |
RFC 1918 ranges | Treats K8s internal traffic as local |
tools.sessions.visibility |
"all" |
Orchestrator can view sub-agent session history |
tools.agentToAgent.enabled |
true |
Enables cross-agent communication |
The container image (Dockerfile.openclaw) includes ops tools (kubectl, helm, terraform, argocd, jq, git, gh). The pod's ServiceAccount has a namespace-scoped Role in openclaw with read-only access to pods, logs, secrets, configmaps, services, PVCs, and exec into pods — it does NOT have cluster-wide access (the previous cluster-admin binding was removed in v1.1.0).
Per-Agent Skill Assignment
Each agent has a skills allowlist in the configmap that restricts which skills it can see. Omitting the field means all skills; an empty array means none.
| Agent | Tier | Assigned Skills |
|---|---|---|
homelab-admin |
Orchestrator | homelab-admin, gitops, secret-management, incident-response, weather, deutsch-tutor, english-tutor |
cursor-agent |
Senior lead | cursor-agent, gitops, software-engineer, security-analyst, qa-tester |
devops-sre |
Junior | devops-sre, gitops, secret-management, incident-response |
software-engineer |
Junior | software-engineer, gitops |
security-analyst |
Junior | security-analyst, gitops, secret-management |
qa-tester |
Junior | qa-tester, gitops, secret-management, incident-response |
All agents get the gitops skill for the mandatory git workflow. Cross-cutting skills (e.g. secret-management) are shared across agents that need them.
Adding a New Agent
- Add the agent entry to
k8s/apps/openclaw/configmap.yamlunderagents.list— include askillsarray and asubagents.allowAgentslist - Add the new agent ID to the orchestrator's
subagents.allowAgentsso it can be spawned - Create
agents/workspaces/<id>/AGENTS.mdwith a lean agent personality (identity, tone, role-specific guidance, rules referencing skills for procedural content) - Add the agent ID to the init container's
forloop ink8s/apps/openclaw/deployment.yaml - Add the agent ID to
tools.agentToAgent.allowin the configmap - Push to
mainvia PR (branch protection requires review) and restart:kubectl rollout restart deployment/openclaw -n openclaw
Sub-agent Spawning
The orchestrator uses maxSpawnDepth: 2:
- Depth 0:
homelab-adminreceives user requests - Depth 1: Orchestrator spawns specialized sub-agents
- Depth 2: Sub-agents can spawn leaf workers for parallel tasks
Limits (configured in configmap.yaml):
maxConcurrent: 4— max parallel sub-agentsmaxChildrenPerAgent: 3— max children per agent sessionarchiveAfterMinutes: 120— auto-cleanup of finished sessions
OpenClaw Skills
Skills provide domain-specific knowledge and commands to agents. They live in skills/ at the repo root and are mounted into the pod via hostPath at /skills.
| Skill | Description |
|---|---|
homelab-admin |
Cluster operations, service management, GitOps workflow, delegation framework |
devops-sre |
Infrastructure debugging, Terraform, incident response, SLOs, deployment strategies |
software-engineer |
Code development, review, testing, K8s manifest conventions, dependency management |
security-analyst |
Security audits, RBAC review, vulnerability assessment, STRIDE threat modeling, CIS hardening |
qa-tester |
Deployment validation, service health testing, regression checks, per-service acceptance criteria (ArgoCD, ESO, Infisical, OpenClaw, Monitoring, Authentik) |
gitops |
ArgoCD App of Apps pattern, sync management, mandatory git workflow reference |
incident-response |
Incident triage, rollback procedures, pre-merge validation, post-incident documentation |
secret-management |
Infisical → ESO → K8s pipeline operations |
cursor-agent |
Cursor CLI bridge: installation, automation, handoff protocol, code generation workflows |
deutsch-tutor |
AI German language tutor — spaced repetition (FSRS), flashcard decks (A1/A2/B1), grammar lessons, conversation practice, Vietnamese explanations |
english-tutor |
AI IELTS 8.0 coach — spaced repetition (FSRS), advanced grammar, academic vocabulary, collocations, writing workshop, speaking simulation, reading/listening strategies |
weather |
Real-time weather via Open-Meteo and wttr.in (no API key required) |
Skill Format
skills/<name>/SKILL.md
With YAML frontmatter:
---
name: <skill-name>
description: <one-line description for the agent>
metadata:
openclaw:
emoji: "<emoji>"
requires: { anyBins: ["kubectl"] } # optional binary requirements
---
Adding a New Skill
- Create
skills/<name>/SKILL.mdwith the frontmatter above - Write operational knowledge in markdown: commands, troubleshooting tables, workflows
- Add the skill name to the
skillsarray of each agent that should use it in the configmap - Push to
mainand restart:kubectl rollout restart deployment/openclaw -n openclaw
Skills auto-load via skills.load.extraDirs: ["/skills"] in the OpenClaw config.
Documentation Maintenance
Both Cursor and OpenClaw agents are expected to keep documentation up-to-date as part of every implementation. The repo includes a documentation freshness tracking system to make this verifiable.
For agents
Before creating a PR, run the freshness checker to see if your changes require doc updates:
python scripts/doc-freshness.py --check-pr # Which docs this branch should update
python scripts/doc-freshness.py --stale # Full staleness report across all docs
The tool reads .doc-manifest.yml (which maps every doc to its implementation sources) and compares git commit timestamps. If sources are newer than their mapped doc, the doc is flagged as stale with the number of commits behind.
The doc-freshness CI workflow runs automatically on every PR to main and will post a warning comment if mapped docs are missing updates. The check is advisory — it does not block merge.
What to update
The .cursor/rules/homelab.mdc file contains the full "What to update" table mapping change areas to docs. The key rule: every service directory under k8s/apps/ must have a README.md that serves as the single source of truth. The corresponding docs/<service>.md is a thin MkDocs wrapper that includes the README.
When adding a new service
Add an entry to .doc-manifest.yml mapping the new service's README to its source directory. This ensures the freshness system tracks it going forward. See the full checklist in .cursor/rules/homelab.mdc under "Adding a new service."
Single Source of Truth
| Content | Source | Consumed By |
|---|---|---|
| Cursor context rules | .cursor/rules/*.mdc |
Cursor IDE |
| Agent personalities | agents/workspaces/*/AGENTS.md |
OpenClaw (copied into pod workspace by init container) |
| Operational skills | skills/*/SKILL.md |
OpenClaw (mounted at /skills), Cursor (read on demand) |
| Agent roster & config | k8s/apps/openclaw/configmap.yaml |
OpenClaw (mounted at /config) |
| Doc-to-source mappings | .doc-manifest.yml |
scripts/doc-freshness.py, doc-freshness CI workflow |
Agent personalities are lean (identity, tone, role-specific guidance) and reference skills for procedural content (git workflow, documentation matrix, labels). This avoids duplication and keeps each piece of content in exactly one source file.