Observability is critical for CleanStart deployments. This guide covers metrics collection, logging from shell-less containers, health checks, and optional integration with CleanSight (which detects outdated CleanStart images in production and recommends upgrades). You'll also learn to use third-party monitoring solutions (Datadog, New Relic, Dynatrace) and set up container-level security monitoring.
Container Observability Fundamentals
Observability in containerized systems requires three signals:
Signal | Purpose | Collection |
|---|---|---|
Metrics | Quantitative measurements (CPU, memory, requests/sec) | Prometheus scrape, agent collection |
Logs | Structured events and errors | stdout/stderr, sidecar agents, journald |
Traces | Distributed request flows across services | Instrumentation library + collector |
CleanStart containers naturally emit logs via stdout/stderr (no shell, no internal log files). Metrics are exposed via /metrics endpoint (Prometheus format) or pushed to collectors.
Prometheus Metrics for CleanStart Containers
Container Resource Metrics
Kubernetes automatically exposes resource metrics for all pods.
Query examples (Prometheus):
# CPU usage (current)container_cpu_usage_seconds_total{pod=~"myapp.*"} # Memory usage (current)container_memory_usage_bytes{pod=~"myapp.*"} # Network in/outcontainer_network_receive_bytes_total{pod=~"myapp.*"}container_network_transmit_bytes_total{pod=~"myapp.*"} # Disk I/Ocontainer_fs_reads_total{pod=~"myapp.*"}container_fs_writes_total{pod=~"myapp.*"} # CPU per pod (5-minute average)rate(container_cpu_usage_seconds_total{pod=~"myapp.*"}[5m]) # Memory percentage (of limit)(container_memory_usage_bytes{pod=~"myapp.*"} / container_spec_memory_limit_bytes) * 100Application-Level Metrics
Expose custom metrics in your application:
Python (Prometheus client library):
from prometheus_client import Counter, Histogram, Gauge, generate_latestfrom fastapi import FastAPIfrom time import time app = FastAPI() # Counters (increment)request_count = Counter( 'http_requests_total', 'Total HTTP requests', ['method', 'endpoint', 'status']) # Histograms (measure latency)request_duration = Histogram( 'http_request_duration_seconds', 'HTTP request latency', buckets=[0.01, 0.05, 0.1, 0.5, 1.0, 5.0], labelnames=['method', 'endpoint']) # Gauges (current value)active_connections = Gauge( 'active_connections', 'Number of active connections') @app.middleware("http")async def metrics_middleware(request, call_next): start_time = time() response = await call_next(request) # Record counter request_count.labels( method=request.method, endpoint=request.url.path, status=response.status_code ).inc() # Record duration request_duration.labels( method=request.method, endpoint=request.url.path ).observe(time() - start_time) return response @app.get("/metrics")async def metrics(): return generate_latest() @app.get("/health")async def health(): # Update gauge active_connections.set(get_connection_count()) return {"status": "healthy"}Node.js (Prometheus client):
const prometheus = require('prom-client');const express = require('express'); const app = express(); // Default metrics (CPU, memory, GC)prometheus.collectDefaultMetrics(); // Custom countersconst httpRequestDuration = new prometheus.Histogram({ name: 'http_request_duration_seconds', help: 'HTTP request latency', labelNames: ['method', 'endpoint', 'status'], buckets: [0.01, 0.05, 0.1, 0.5, 1.0]}); const httpRequests = new prometheus.Counter({ name: 'http_requests_total', help: 'Total HTTP requests', labelNames: ['method', 'endpoint', 'status']}); app.use((req, res, next) => { const start = Date.now(); res.on('finish', () => { const duration = (Date.now() - start) / 1000; httpRequestDuration .labels(req.method, req.path, res.statusCode) .observe(duration); httpRequests .labels(req.method, req.path, res.statusCode) .inc(); }); next();}); app.get('/metrics', async (req, res) => { res.set('Content-Type', prometheus.register.contentType); res.end(await prometheus.register.metrics());}); app.get('/health', (req, res) => { res.json({ status: 'healthy' });});Go (Prometheus client):
package main import ( "net/http" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promhttp" "time") var ( httpDuration = prometheus.NewHistogramVec( prometheus.HistogramOpts{ Name: "http_request_duration_seconds", Help: "HTTP request latency", Buckets: []float64{0.01, 0.05, 0.1, 0.5, 1.0}, }, []string{"method", "endpoint"}, ) httpRequests = prometheus.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total HTTP requests", }, []string{"method", "endpoint", "status"}, )) func init() { prometheus.MustRegister(httpDuration, httpRequests)} func recordMetrics(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { start := time.Now() next.ServeHTTP(w, r) duration := time.Since(start).Seconds() httpDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration) })} func main() { http.Handle("/metrics", promhttp.Handler()) http.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) { w.WriteHeader(http.StatusOK) w.Write([]byte(`{"status":"healthy"}`)) }) http.ListenAndServe(":8080", recordMetrics(http.DefaultServeMux))}ServiceMonitor for Prometheus Operator
Define how Prometheus discovers and scrapes metrics:
apiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata: name: myapp namespace: production labels: app: myappspec: selector: matchLabels: app: myapp endpoints: - port: metrics interval: 30s path: /metrics scheme: http scrapeTimeout: 10s relabelings: # Add custom labels - sourceLabels: [__meta_kubernetes_pod_name] targetLabel: pod - sourceLabels: [__meta_kubernetes_pod_namespace] targetLabel: namespace---# Service with metrics portapiVersion: v1kind: Servicemetadata: name: myapp namespace: productionspec: selector: app: myapp ports: - name: metrics port: 8080 targetPort: 8080 protocol: TCPDeploy ServiceMonitor:
kubectl apply -f servicemonitor.yaml # Verify Prometheus discovers the targetkubectl exec -it prometheus-0 -n monitoring -- \ curl localhost:9090/api/v1/targetsGrafana Dashboard for Metrics
Visualize metrics with Grafana:
{ "dashboard": { "title": "CleanStart Application Metrics", "panels": [ { "title": "Request Rate", "targets": [ { "expr": "rate(http_requests_total[5m])" } ], "type": "graph" }, { "title": "Request Latency (p95)", "targets": [ { "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))" } ] }, { "title": "Container CPU Usage", "targets": [ { "expr": "rate(container_cpu_usage_seconds_total{pod=~\"myapp.*\"}[5m])" } ] }, { "title": "Container Memory Usage", "targets": [ { "expr": "container_memory_usage_bytes{pod=~\"myapp.*\"} / 1024 / 1024" } ] } ] }}Logging from Shell-Less Containers
CleanStart containers have no shell or internal logging system. All logs go to stdout/stderr.
Structured JSON Logging
Log structured JSON for easy parsing by log collectors:
Python:
import jsonimport sysfrom datetime import datetime def log(level, message, **extra): log_entry = { "timestamp": datetime.utcnow().isoformat() + "Z", "level": level, "message": message, **extra } print(json.dumps(log_entry), file=sys.stdout) # Usagelog("INFO", "Application started", version="1.0.0", environment="production")log("ERROR", "Database connection failed", error="timeout", host="db.local")log("WARN", "High memory usage", memory_mb=450, threshold_mb=500)Node.js (pino logger):
const pino = require('pino'); const logger = pino({ transport: { target: 'pino-pretty', options: { colorize: false, singleLine: true, translateTime: 'SYS:standard' } }}); // Usagelogger.info({ version: '1.0.0' }, 'Application started');logger.error({ error: 'timeout', host: 'db.local' }, 'Database connection failed');logger.warn({ memory_mb: 450, threshold: 500 }, 'High memory usage');Fluentd DaemonSet for Log Collection
Deploy Fluentd to collect logs from all pods:
apiVersion: v1kind: ConfigMapmetadata: name: fluentd-config namespace: loggingdata: fluent.conf: | <source> @type tail path /var/log/containers/*_production_*.log pos_file /var/log/fluentd-containers.log.pos tag kubernetes.* <parse> @type json time_key timestamp time_format %Y-%m-%dT%H:%M:%S.%NZ </parse> </source> <filter kubernetes.**> @type kubernetes_metadata kubernetes_url "#{ENV['FLUENT_FILTER_KUBERNETES_URL'] || 'http://127.0.0.1:8080'}" tag_to_kubernetes_name_re (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace>[^_]+)_(?<pod_hash>[a-z0-9]{8})\.log$ </filter> <match kubernetes.var.log.containers.*_production_*.log> @type elasticsearch @id output_es @log_level info include_tag_key true host elasticsearch port 9200 index_name kubernetes-${Time.strftime(%Y.%m.%d)} logstash_format true logstash_prefix kubernetes <buffer> @type file path /var/log/fluentd-buffers/kubernetes.system.buffer flush_mode interval retry_type exponential_backoff flush_interval 5s retry_forever false retry_max_interval 30 chunk_limit_size "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_CHUNK_LIMIT_SIZE'] || '8M'}" queue_limit_length "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_QUEUE_LIMIT_LENGTH'] || '256'}" flush_thread_count "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_THREAD_COUNT'] || '1'}" </buffer> </match>---apiVersion: apps/v1kind: DaemonSetmetadata: name: fluentd namespace: loggingspec: selector: matchLabels: app: fluentd template: metadata: labels: app: fluentd spec: serviceAccount: fluentd serviceAccountName: fluentd tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule - key: node-role.kubernetes.io/control-plane effect: NoSchedule containers: - name: fluentd image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch env: - name: FLUENT_ELASTICSEARCH_HOST value: "elasticsearch" - name: FLUENT_ELASTICSEARCH_PORT value: "9200" - name: FLUENT_ELASTICSEARCH_SCHEME value: "http" - name: FLUENTD_SYSTEMD_CONF value: disable resources: limits: memory: 512Mi requests: cpu: 100m memory: 256Mi volumeMounts: - name: varlog mountPath: /var/log - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: config mountPath: /fluentd/etc/fluent.conf subPath: fluent.conf volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: config configMap: name: fluentd-config---apiVersion: v1kind: ServiceAccountmetadata: name: fluentd namespace: logging---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: fluentdrules:- apiGroups: - "" resources: - pods - namespaces verbs: - get - list - watch---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: fluentdroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: fluentdsubjects:- kind: ServiceAccount name: fluentd namespace: loggingFluent Bit (Lighter Alternative)
For resource-constrained environments:
apiVersion: v1kind: ConfigMapmetadata: name: fluent-bit-config namespace: loggingdata: fluent-bit.conf: | [SERVICE] Daemon Off Flush 1 Log_Level info [INPUT] Name tail Path /var/log/containers/*_production_*.log Parser json Tag kube.* Mem_Buf_Limit 5MB Skip_Long_Lines On [FILTER] Name kubernetes Match kube.* [OUTPUT] Name elasticsearch Match * Host elasticsearch Port 9200 Logstash_Format On Logstash_Prefix kubernetes Retry_Limit False---apiVersion: apps/v1kind: DaemonSetmetadata: name: fluent-bit namespace: loggingspec: selector: matchLabels: app: fluent-bit template: metadata: labels: app: fluent-bit spec: containers: - name: fluent-bit image: fluent/fluent-bit:latest volumeMounts: - name: varlog mountPath: /var/log - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: config mountPath: /fluent-bit/etc/ resources: limits: memory: 100Mi requests: cpu: 50m memory: 50Mi volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: config configMap: name: fluent-bit-configAlerting Rules
PrometheusRule Examples
apiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata: name: myapp-alerts namespace: productionspec: groups: - name: cleanstart.rules interval: 30s rules: # Image vulnerability alerts - alert: ImageVulnerabilityDetected expr: | vulnerabilities_found{pod=~"myapp.*"} > 0 for: 1m labels: severity: critical component: security annotations: summary: "Vulnerability detected in running image" description: "{{ $value }} vulnerabilities found in {{ $labels.pod }}" runbook: "https://wiki.example.com/runbooks/image-vulnerability" # Container restart alerts - alert: ContainerRestartingTooOften expr: | rate(container_last_seen{pod=~"myapp.*"}[5m]) > 1 for: 5m labels: severity: warning annotations: summary: "Container {{ $labels.pod }} restarting frequently" description: "Pod restarted {{ $value }} times in 5 minutes" # Resource exhaustion - alert: HighMemoryUsage expr: | (container_memory_usage_bytes{pod=~"myapp.*"} / container_spec_memory_limit_bytes) > 0.9 for: 5m labels: severity: warning annotations: summary: "High memory usage in {{ $labels.pod }}" description: "Memory usage {{ $value | humanizePercentage }}" - alert: HighCPUUsage expr: | rate(container_cpu_usage_seconds_total{pod=~"myapp.*"}[5m]) > 0.8 for: 5m labels: severity: warning annotations: summary: "High CPU usage in {{ $labels.pod }}" description: "CPU usage {{ $value | humanizePercentage }}" # Application performance - alert: HighErrorRate expr: | (sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate in {{ $labels.job }}" description: "Error rate is {{ $value | humanizePercentage }}" - alert: HighLatency expr: | histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1 for: 5m labels: severity: warning annotations: summary: "High request latency" description: "p95 latency is {{ $value }}s" # Registry/image issues - alert: ImagePullFailed expr: | increase(pod_image_pull_errors_total{pod=~"myapp.*"}[5m]) > 0 labels: severity: critical annotations: summary: "Failed to pull image {{ $labels.pod }}" description: "Image pull failed {{ $value }} times" # Cluster health - alert: NodeNotReady expr: | kube_node_status_condition{condition="Ready",status="true"} == 0 for: 5m labels: severity: critical annotations: summary: "Node {{ $labels.node }} is not ready" - alert: PersistentVolumeClaimPending expr: | kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1 for: 15m labels: severity: warning annotations: summary: "PVC {{ $labels.persistentvolumeclaim }} stuck pending"Deploy alerts:
kubectl apply -f alerts.yaml # Verify alert rules loadedkubectl get prometheusrule -n productionHealth Check Patterns
Kubernetes Health Probes
CleanStart containers run without shell, so health checks must work without shell scripts.
apiVersion: apps/v1kind: Deploymentmetadata: name: myapp namespace: productionspec: replicas: 3 template: spec: containers: - name: myapp image: registry.cleanstart.com/python3:3.12.5-prod ports: - name: http containerPort: 8080 - name: metrics containerPort: 8081 # Startup probe: container initializing (only runs once) startupProbe: httpGet: path: /startup port: http initialDelaySeconds: 5 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 30 # 30 * 10 = 300 seconds max startup time # Readiness probe: ready to serve traffic readinessProbe: httpGet: path: /ready port: http initialDelaySeconds: 10 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 2 # If fails: pod removed from service, but not restarted # Liveness probe: container alive livenessProbe: httpGet: path: /alive port: http initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 3 failureThreshold: 3 # If fails: pod is restartedHealth Check Endpoints
Implement health check endpoints in your application:
Python:
from fastapi import FastAPIimport asyncio app = FastAPI() # Global stateinitialized = Falseready = True @app.on_event("startup")async def startup(): global initialized # Perform heavy initialization await asyncio.sleep(2) initialized = True @app.get("/startup")async def startup_probe(): # Only passes once initialization complete if not initialized: raise Exception("Still initializing") return {"status": "initialized"} @app.get("/ready")async def readiness_probe(): # Check dependencies (DB, cache) try: db.ping() redis.ping() except Exception as e: ready = False return {"status": "not_ready", "reason": str(e)}, 503 return {"status": "ready"} @app.get("/alive")async def liveness_probe(): # Check basic operation return {"status": "alive"}Node.js:
const express = require('express');const app = express(); let initialized = false;let ready = true; // StartupsetTimeout(() => { // Initialization complete initialized = true; console.log('Application initialized');}, 2000); app.get('/startup', (req, res) => { if (!initialized) { res.status(503).json({ status: 'initializing' }); } else { res.status(200).json({ status: 'initialized' }); }}); app.get('/ready', (req, res) => { // Check dependencies if (!db.connected || !redis.connected) { res.status(503).json({ status: 'not_ready', reason: 'dependency_unavailable' }); } else { res.status(200).json({ status: 'ready' }); }}); app.get('/alive', (req, res) => { res.status(200).json({ status: 'alive' });});CleanSight Integration (Optional)
CleanSight is an optional multi-cloud discovery platform whose primary purpose is detecting outdated CleanStart images running in your production environments and recommending upgrades to newer, patched versions. It is not mandatory — organizations can connect their own repositories and monitoring tools instead.
What CleanSight Does
Core function: Find old CleanStart images in production → suggest newer ones.
CleanSight deploys lightweight discovery agents into your cloud environments. These agents scan your Kubernetes clusters, container registries, and container services to build an inventory of every CleanStart image running in production. It then compares each image against the latest available versions from registry.cleanstart.com and flags images that are behind — whether by patch version, minor version, or those affected by newly disclosed CVEs.
Component | Purpose |
|---|---|
Agent Manager | Orchestrates discovery agents across clouds, manages lifecycle |
Discovery Agents | Cloud-specific agents for AWS (EKS/ECR/ECS), Azure (AKS/ACR), GCP (GKE/Artifact Registry/Cloud Run) |
Image Inventory | Catalogs every CleanStart image running in your clusters |
Version Comparison | Compares running images against latest available from |
Upgrade Recommendations | Identifies outdated images and recommends specific upgrade targets |
SBOM Generation | Syft integration for Software Bill of Materials (supports compliance) |
Vulnerability Context | Trivy/Grype scanning shows which CVEs are resolved by upgrading (added as a supplementary capability) |
Web Dashboard | Unified view of image freshness across all clusters |
REST/gRPC APIs | Programmatic access for CI/CD integration |
When to Use CleanSight
You run CleanStart images across multiple clusters and need visibility into which ones are outdated. You want automated upgrade recommendations — CleanSight tells you exactly which image tag to pull. Multi-cloud environments: Single pane of glass across AWS, Azure, and GCP. Compliance requirements: Demonstrate that production images are current and patched. Not mandatory: Organizations can monitor image freshness through their own tooling, registry webhooks, or CI/CD checks instead.
Deploying CleanSight Agent Manager
Option 1: Docker Compose (Single Machine)
# docker-compose.ymlversion: '3.8'services: cleansight-agent-manager: image: cleansight/agent-manager:latest ports: - "9090:9090" # Web dashboard - "50051:50051" # gRPC API environment: CLEANSIGHT_REGION: us-west-2 CLEANSIGHT_LOG_LEVEL: info AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID} AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY} AZURE_CLIENT_ID: ${AZURE_CLIENT_ID} AZURE_CLIENT_SECRET: ${AZURE_CLIENT_SECRET} GCP_PROJECT_ID: ${GCP_PROJECT_ID} GCP_SERVICE_ACCOUNT_JSON: ${GCP_SERVICE_ACCOUNT_JSON} volumes: - ./config:/etc/cleansight - cleansight-data:/var/lib/cleansight restart: unless-stopped healthcheck: test: ["CMD", "curl", "-f", "http://localhost:9090/health"] interval: 30s timeout: 10s retries: 3 volumes: cleansight-data:Start CleanSight:
docker-compose up -d # Wait for initializationsleep 10 # Access dashboardopen http://localhost:9090 # Logsdocker-compose logs -f cleansight-agent-managerOption 2: Kubernetes
apiVersion: v1kind: ConfigMapmetadata: name: cleansight-config namespace: cleansightdata: config.yaml: | agent_manager: web_port: 9090 grpc_port: 50051 log_level: info cloud_providers: aws: enabled: true regions: [us-west-2, us-east-1, eu-west-1] azure: enabled: true subscriptions: [] # Auto-discover if empty gcp: enabled: true projects: [] scanning: schedule: "0 */6 * * *" # Every 6 hours sbom_tool: syft vulnerability_tool: trivy timeout_seconds: 3600 ---apiVersion: apps/v1kind: Deploymentmetadata: name: cleansight-agent-manager namespace: cleansightspec: replicas: 1 selector: matchLabels: app: cleansight template: metadata: labels: app: cleansight spec: serviceAccountName: cleansight containers: - name: cleansight image: cleansight/agent-manager:latest ports: - name: web containerPort: 9090 - name: grpc containerPort: 50051 env: - name: CLEANSIGHT_CONFIG_PATH value: /etc/cleansight/config.yaml - name: AWS_ACCESS_KEY_ID valueFrom: secretKeyRef: name: cloud-credentials key: aws-key - name: AWS_SECRET_ACCESS_KEY valueFrom: secretKeyRef: name: cloud-credentials key: aws-secret resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "2Gi" cpu: "1000m" livenessProbe: httpGet: path: /health port: web initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: web initialDelaySeconds: 10 periodSeconds: 5 volumeMounts: - name: config mountPath: /etc/cleansight volumes: - name: config configMap: name: cleansight-config ---apiVersion: v1kind: Servicemetadata: name: cleansight-agent-manager namespace: cleansightspec: selector: app: cleansight ports: - name: web port: 9090 targetPort: web - name: grpc port: 50051 targetPort: grpc type: ClusterIP ---apiVersion: v1kind: ServiceAccountmetadata: name: cleansight namespace: cleansight ---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: cleansightrules:- apiGroups: [""] resources: ["pods"] verbs: ["get", "list", "watch"]- apiGroups: [""] resources: ["pods/log"] verbs: ["get"]- apiGroups: ["apps"] resources: ["deployments", "statefulsets", "daemonsets"] verbs: ["get", "list", "watch"]- apiGroups: ["batch"] resources: ["jobs", "cronjobs"] verbs: ["get", "list", "watch"] ---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: cleansightroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cleansightsubjects:- kind: ServiceAccount name: cleansight namespace: cleansightDeploy to Kubernetes:
kubectl create namespace cleansightkubectl apply -f cleansight-deployment.yaml # Check statuskubectl get pods -n cleansightkubectl logs -f deployment/cleansight-agent-manager -n cleansight # Port-forward to access dashboardkubectl port-forward -n cleansight svc/cleansight-agent-manager 9090:9090open http://localhost:9090Connecting CleanSight to Your Cluster
Configuration:
- Open CleanSight dashboard: http://localhost:9090
- Go to Settings → Cloud Providers
- For Kubernetes (AWS EKS): Cluster Name: myapp-production Region: us-west-2 OIDC Provider: oidc.eks.us-west-2.amazonaws.com/id/ABC123 Service Account Role: arn:aws:iam::123456789:role/cleansight-eks-role
- For Kubernetes (Azure AKS): Cluster Name: myapp-production Resource Group: myapp-rg Subscription: my-subscription
- Click Test Connection → Save
CleanSight automatically: Discovers all running containers and identifies CleanStart images. Compares each image version against the latest available from registry.cleanstart.com. Flags outdated images and recommends specific upgrade targets (e.g., "python3:3.12.3-prod → python3:3.12.5-prod"). Shows which CVEs are resolved by upgrading to the recommended version. Generates SBOMs for compliance and audit trails. Tracks upgrade adoption across your fleet over time.
Accessing CleanSight Data
Web Dashboard:
The CleanSight web dashboard is accessible at http://localhost:9090 and provides several key sections. The Image Inventory displays all CleanStart images running in your clusters, including version numbers, age of each image, and current status. The Upgrade Recommendations section flags outdated images and provides recommended target versions. The CVE Impact view shows which vulnerabilities are resolved by upgrading to recommended versions, along with severity levels and affected image counts.
The Fleet Overview provides fleet-wide metrics including upgrade adoption rate and image freshness trends across your clusters. The Reports section generates compliance documentation such as SBOMs, vulnerability scan history, and audit trails. Finally, the Settings area allows you to configure cloud provider integrations, upgrade scheduling policies, and notification webhooks for alerting.
REST API:
# List all CleanStart images discovered across clusterscurl http://localhost:9090/api/v1/images # Get upgrade recommendations for outdated imagescurl http://localhost:9090/api/v1/images/outdated # Get specific image details and recommended upgrade targetcurl http://localhost:9090/api/v1/images/sha256:abc123/upgrade-recommendation # Get CVEs resolved by upgrading a specific imagecurl http://localhost:9090/api/v1/images/sha256:abc123/cves-resolved # Generate SBOM for compliancecurl http://localhost:9090/api/v1/images/sha256:abc123/sbom -o sbom.json # Trigger a discovery scan across connected clusterscurl -X POST http://localhost:9090/api/v1/scans \ -H "Content-Type: application/json" \ -d '{"provider":"aws","region":"us-west-2"}'gRPC API (for integrations):
import grpcfrom cleansight.v1 import container_service_pb2, container_service_pb2_grpc channel = grpc.secure_channel('cleansight-agent-manager:50051', grpc.ssl_channel_credentials())stub = container_service_pb2_grpc.ContainerServiceStub(channel) # List containersresponse = stub.ListContainers(container_service_pb2.ListContainersRequest())for container in response.containers: print(f"{container.image}: {len(container.vulnerabilities)} vulns")WebSocket Real-time Events:
const ws = new WebSocket('ws://localhost:9090/api/v1/events'); ws.onmessage = (event) => { const message = JSON.parse(event.data); console.log('Event:', message); // { // "type": "vulnerability_found", // "image": "registry.cleanstart.com/python3:3.12.5-prod", // "vulnerability": { "id": "CVE-2024-1234", "severity": "HIGH" } // }};Bring Your Own Monitoring
CleanStart containers run on GLIBC Linux, so third-party monitoring agents work natively.
Datadog Agent
apiVersion: apps/v1kind: DaemonSetmetadata: name: datadog-agent namespace: datadogspec: selector: matchLabels: app: datadog-agent template: metadata: labels: app: datadog-agent spec: serviceAccountName: datadog-agent hostNetwork: true hostPID: true containers: - name: agent image: datadog/agent:latest env: - name: DD_API_KEY valueFrom: secretKeyRef: name: datadog-api-key key: api-key - name: DD_KUBERNETES_KUBELET_HOST valueFrom: fieldRef: fieldPath: status.hostIP - name: DD_APM_ENABLED value: "true" - name: DD_LOGS_ENABLED value: "true" resources: limits: memory: 256Mi requests: cpu: 100m memory: 128Mi volumeMounts: - name: docker mountPath: /var/run/docker.sock - name: proc mountPath: /host/proc readOnly: true - name: sys mountPath: /host/sys readOnly: true volumes: - name: docker hostPath: path: /var/run/docker.sock - name: proc hostPath: path: /proc - name: sys hostPath: path: /sysApplication instrumentation (Python):
from ddtrace import patch_allimport logging # Patch all librariespatch_all() # Datadog loggerlogging.basicConfig( format='{"timestamp": "%(asctime)s", "level": "%(levelname)s", "message": "%(message)s"}') # Application code (auto-traced by Datadog)@app.get("/api/users/{user_id}")def get_user(user_id: int): return {"id": user_id, "name": "Alice"}New Relic
apiVersion: v1kind: ConfigMapmetadata: name: newrelic-config namespace: newrelicdata: newrelic-config.yaml: | integration_name: nri-kubernetes instances: - name: nri-kubernetes command: pod ---apiVersion: apps/v1kind: DaemonSetmetadata: name: newrelic-agent namespace: newrelicspec: selector: matchLabels: app: newrelic-agent template: metadata: labels: app: newrelic-agent spec: serviceAccountName: newrelic-agent hostNetwork: true hostPID: true containers: - name: agent image: newrelic/infrastructure-k8s:latest env: - name: NRIP_LICENSE_KEY valueFrom: secretKeyRef: name: newrelic-license-key key: license - name: NRIP_CUSTOM_ATTRIBUTES value: "environment=production" resources: limits: memory: 512Mi requests: cpu: 100m memory: 256Mi volumeMounts: - name: docker mountPath: /var/run/docker.sock - name: proc mountPath: /host/proc readOnly: true volumes: - name: docker hostPath: path: /var/run/docker.sock - name: proc hostPath: path: /procDynatrace
# Install Dynatrace Operatorkubectl create namespace dynatracekubectl apply -f https://github.com/Dynatrace/dynatrace-operator/releases/latest/download/dynatrace-operator.yaml # Create DynatraceOneAgent resourcekubectl apply -f - <<EOFapiVersion: dynatrace.com/v1beta1kind: DynatraceOneAgentmetadata: name: dynatrace namespace: dynatracespec: apiUrl: https://YOUR-ENVIRONMENT-ID.live.dynatrace.com/api tokens: apiToken: valueFrom: secretKeyRef: name: dynakube key: apiToken paasToken: valueFrom: secretKeyRef: name: dynakube key: paasTokenEOFRuntime Security Monitoring
Monitor container behavior at runtime using eBPF-based tools.
Falco (Container Runtime Security)
apiVersion: v1kind: ConfigMapmetadata: name: falco-config namespace: falcodata: falco.yaml: | rules_file: - /etc/falco/rules.yaml - /etc/falco/rules.d plugins: [] output: outputs: - json syslog_output: enabled: true file_output: enabled: true keep_alive: false filename: /var/log/falco/falco.log ---apiVersion: apps/v1kind: DaemonSetmetadata: name: falco namespace: falcospec: selector: matchLabels: app: falco template: metadata: labels: app: falco spec: hostNetwork: true hostPID: true containers: - name: falco image: falcosecurity/falco:latest securityContext: privileged: true volumeMounts: - name: docker mountPath: /var/run/docker.sock - name: cgroup mountPath: /host/sys/fs/cgroup readOnly: true - name: proc mountPath: /host/proc readOnly: true volumes: - name: docker hostPath: path: /var/run/docker.sock - name: cgroup hostPath: path: /sys/fs/cgroup - name: proc hostPath: path: /procCustom Falco rules for CleanStart:
- rule: Unauthorized Process Execution desc: Detect suspicious process execution condition: spawned_process and not allowed_process output: "Process execution detected (user=%user.name process=%proc.name)" priority: WARNING - rule: Suspicious Network Connection desc: Detect unusual outbound connections condition: outbound and not trusted_ip output: "Network connection (src=%fd.sip dst=%fd.dip port=%fd.dport)" priority: WARNINGSummary
Observability for CleanStart deployments requires:
Metrics: Prometheus scraping application /metrics endpoint + Kubernetes resource metrics Logs: Structured JSON to stdout/stderr, collected by Fluentd/Fluent Bit to Elasticsearch Traces: Distributed tracing with OpenTelemetry (instrumentation libraries + OTLP collector) Health Checks: Startup, readiness, liveness probes without shell scripts Alerting: PrometheusRule definitions for critical conditions Optional CleanSight: Detects outdated CleanStart images in production and recommends upgrades Third-party Monitoring: Datadog, New Relic, Dynatrace agents work natively on GLIBC Runtime Security: Falco monitors suspicious container behavior
Complete observability = Metrics + Logs + Traces + Health Checks + Alerting + Optional CleanSight for image freshness tracking.
