You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
bpfman-agent runs as a hostNetwork: true DaemonSet to perform
eBPF operations. This introduces two key issues for exposing Prometheus
metrics on a TCP port.
Local Port Conflicts
Using hostNetwork: true means metrics listeners bind directly to the
node’s interface. This causes:
Port conflicts with host services
Inability to run multiple containers using the same port
Errors like listen tcp :8443: bind: address already in use
Cross-Node Networking Breakage
With hostNetwork: true, the pod’s IP becomes the node IP (e.g.,
an EC2 instance IP). As a result, cross-node metrics collection becomes
dependent on infrastructure-level routing:
Normal: Prometheus → Pod IP → Metrics
HostNetwork: Prometheus → Node IP → AWS SG → Node → Pod
Symptoms include:
Only same-node pods are scraped successfully
Prometheus target list shows 1/N pods up
Application port unreachable, even when ICMP ping works
Timeout depending on listener configuration
Mitigation often involves infrastructure-specific workarounds, e.g.:
frobware
changed the title
Separate metrics collection to eliminate TCP port conflicts and cloud networking dependencies
Separate metrics collection in bpfman-agent to eliminate TCP port conflicts and cloud networking dependencies
May 22, 2025
Uh oh!
There was an error while loading. Please reload this page.
bpfman-agent runs as a
hostNetwork: true
DaemonSet to performeBPF operations. This introduces two key issues for exposing Prometheus
metrics on a TCP port.
Using hostNetwork: true means metrics listeners bind directly to the
node’s interface. This causes:
With
hostNetwork: true
, the pod’s IP becomes the node IP (e.g.,an EC2 instance IP). As a result, cross-node metrics collection becomes
dependent on infrastructure-level routing:
Normal: Prometheus → Pod IP → Metrics
HostNetwork: Prometheus → Node IP → AWS SG → Node → Pod
Symptoms include:
up
Mitigation often involves infrastructure-specific workarounds, e.g.:
This is brittle and undesirable in multi-tenant or cloud-agnostic deployments.
Solution: Two-Tier Metrics Architecture
Introduce a metrics proxy DaemonSet:
bpfman-agent (hostNetwork)
↳ Unix socket: /var/run/bpfman-metrics/metrics.sock
metrics-proxy (pod network)
↳ Listens on TCP 8443 inside pod network
↳ Proxies requests to agent's Unix socket
metrics-proxy runs without
hostNetwork: true
and exposes metricsover HTTPS on port 8443, preserving the current ServiceMonitor and
TLS setup.
Operator Changes
Benefits
Trade-offs
The text was updated successfully, but these errors were encountered: