Trivy Operator
Trivy Operator is a Kubernetes operator that continuously scans container images for vulnerabilities and generates VulnerabilityReport resources for each running pod.
Overview
- Helm Chart:
aquasecurity/trivy-operator - Namespace:
monitoring - Reports: View vulnerabilities with
kubectl get vulnerabilityreports -n <pod-namespace> - Integration: Works natively with Prometheus and Grafana (dashboards can be added)
Architecture
The operator runs in ClientServer mode: a dedicated trivy-server StatefulSet maintains the vulnerability database on a persistent volume, and scan jobs query it over HTTP instead of each managing their own cache. This eliminates cache lock contention when scanning pods with multiple containers.
flowchart TD
subgraph cluster["Kubernetes Cluster"]
OP[Trivy Operator<br/>Deployment]
TS[Trivy Server<br/>StatefulSet + PVC]
SJ[Scan Jobs]
Pod1[Pod with Image]
Pod2[Pod with Image]
VR[VulnerabilityReport<br/>Custom Resource]
end
OP -->|watches| Pod1
OP -->|watches| Pod2
OP -->|creates| SJ
SJ -->|queries| TS
SJ -->|produces| VR
Configuration
The operator is deployed with tuned settings to balance scanning coverage with resource stability on a single-node homelab:
Resource Optimization Notes:
- Infra assessment (node-collector) is disabled (operator.infraAssessmentScannerEnabled: false) because its hostPath volume requirements violate the monitoring namespace's PodSecurity baseline and it's unnecessary for single-node OrbStack.
- Runs in ClientServer mode with a built-in trivy-server (eliminates cache lock contention)
- On-change vulnerability scanning is disabled to reduce resource usage; a scheduled daily scan is performed via a
VulnerabilityScannercustom resource. - Scans images of running pods on a schedule and generates reports.
- Does not block pod scheduling (report-only)
- Stores reports as Kubernetes custom resources
- Excludes the
openclawnamespace (locally-built image not available from any registry) - Limits concurrent scan jobs to 3
- Config audit (compliance) scans run daily at midnight UTC via the built-in compliance scheduler
- Scanner reports are retained for 72 hours; completed scan jobs are cleaned up after 30 minutes
Helm Values
Overrides are set in the Application CR's spec.source.helm.valuesObject:
| Key | Value | Purpose |
|---|---|---|
trivy.mode |
ClientServer |
Centralized DB via trivy-server, no per-job cache |
operator.builtInTrivyServer |
true |
Deploy trivy-server StatefulSet in-cluster |
trivy.serverURL |
http://trivy-service.monitoring:4975 |
Scan jobs query this endpoint |
operator.vulnerabilityScannerEnabled |
false |
Disable on-change vulnerability scans |
operator.scanJobsConcurrentLimit |
3 |
Limit parallel scan jobs |
operator.scanJobTTL |
30m |
Cleanup completed scan jobs after 30 minutes |
operator.scanJobsRetryDelay |
2m |
Delay between retry attempts |
operator.scannerReportTTL |
72h |
Keep vulnerability reports for 3 days |
compliance.cron |
0 0 * * * |
Daily config audit scans at midnight |
resources.limits.memory |
512Mi |
Operator deployment — prevents OOM |
trivy.resources.limits.memory |
1Gi |
Scan job container memory (increased to prevent OOM) |
trivy.server.resources.limits.memory |
512Mi |
Trivy server memory |
excludeNamespaces |
openclaw |
Skip namespaces with local-only images |
Scheduled Vulnerability Scans
A separate VulnerabilityScanner custom resource (installed via k8s/apps/trivy-operator/vulnerability-scanner.yaml) defines a daily scan schedule. This is managed by the trivy-operator's VulnerabilityScanner controller and runs in addition to (or instead of) on-change scans. With operator.vulnerabilityScannerEnabled: false, only the scheduled scans are performed.
Additional options:
- trivy.severity: filter by severity (e.g., HIGH, CRITICAL)
- trivy.ignoreUnfixed: whether to ignore vulnerabilities without a fix
Refer to the Trivy Operator documentation for advanced configuration.
Secrets
No secrets are required; the operator uses read-only access to the Kubernetes API and the container runtime (via hostPID and container runtime socket).
Networking
- The trivy-server needs egress to download the vulnerability database (
mirror.gcr.io/aquasec/trivy-db). - Scan jobs need egress to container registries to pull image layers for scanning.
- Ensure that egress to
ghcr.io,docker.io,mirror.gcr.io, and other registries is allowed on HTTPS (443).
Operational Commands
# List all vulnerability reports
kubectl get vulnerabilityreports --all-namespaces
# View report for a specific pod
kubectl get vulnerabilityreport <pod-name> -n <namespace> -o yaml
# Delete old reports (they are automatically garbage-collected)
kubectl delete vulnerabilityreport --all -n <namespace>
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| No VulnerabilityReport CRs appear | Operator not running or RBAC issues | Check pod logs: kubectl logs -n monitoring -l app.kubernetes.io/name=trivy-operator |
Reports show FAILED |
Image scan failed (private registry, large image, timeout) | Verify image pull secret exists; consider increasing resources/timeouts |
| Operator OOMKilled (exit 137) | Too many workloads to reconcile | Increase resources.limits.memory in Helm values |
| "cache may be in use by another process" | Multiple scan containers contending on shared cache | Use ClientServer mode (trivy.mode: ClientServer) |
| Scan job OOMKilled | Large container image exceeds scan job memory | Increase trivy.resources.limits.memory |
| "unable to find the specified image" | Local-only image not in any registry | Add namespace to excludeNamespaces |