Skip to content

Separate metrics collection in bpfman-agent to eliminate TCP port conflicts and cloud networking dependencies #442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
frobware opened this issue May 22, 2025 · 0 comments

Comments

@frobware
Copy link
Contributor

frobware commented May 22, 2025

bpfman-agent runs as a hostNetwork: true DaemonSet to perform
eBPF operations. This introduces two key issues for exposing Prometheus
metrics on a TCP port.

  1. Local Port Conflicts

Using hostNetwork: true means metrics listeners bind directly to the
node’s interface. This causes:

  • Port conflicts with host services
  • Inability to run multiple containers using the same port
  • Errors like listen tcp :8443: bind: address already in use
  1. Cross-Node Networking Breakage

With hostNetwork: true, the pod’s IP becomes the node IP (e.g.,
an EC2 instance IP). As a result, cross-node metrics collection becomes
dependent on infrastructure-level routing:

Normal: Prometheus → Pod IP → Metrics
HostNetwork: Prometheus → Node IP → AWS SG → Node → Pod

Symptoms include:

  • Only same-node pods are scraped successfully
  • Prometheus target list shows 1/N pods up
  • Application port unreachable, even when ICMP ping works
  • Timeout depending on listener configuration

Mitigation often involves infrastructure-specific workarounds, e.g.:

aws ec2 authorize-security-group-ingress \
  --group-id sg-node \
  --protocol tcp --port 8443 \
  --source-group sg-node

This is brittle and undesirable in multi-tenant or cloud-agnostic deployments.


Solution: Two-Tier Metrics Architecture

Introduce a metrics proxy DaemonSet:

bpfman-agent (hostNetwork)
↳ Unix socket: /var/run/bpfman-metrics/metrics.sock

metrics-proxy (pod network)
↳ Listens on TCP 8443 inside pod network
↳ Proxies requests to agent's Unix socket

metrics-proxy runs without hostNetwork: true and exposes metrics
over HTTPS on port 8443, preserving the current ServiceMonitor and
TLS setup.

Operator Changes

  • Remove TCP listener from bpfman-agent
  • Add and manage metrics-proxy DaemonSet
  • Mount shared socket via hostPath (or projected volume)
  • Maintain compatibility with Prometheus configuration

Benefits

  • Avoids host-level port conflicts
  • No cloud-specific firewall/SG rules needed
  • Full cross-node scraping support without special infra
  • Cloud-agnostic; no AWS-only assumptions
  • Clean separation of eBPF operations and metrics serving

Trade-offs

  • One extra DaemonSet (~16–64Mi memory, 10–100m CPU)
  • metrics-proxy needs privileged: true or access to the host path /var/run/bpfman-metrics
  • Slight increase in complexity and maintenance surface
@github-project-automation github-project-automation bot moved this to 🆕 New in bpfman May 22, 2025
frobware added a commit to frobware/bpfman-operator that referenced this issue May 22, 2025
@frobware frobware changed the title Separate metrics collection to eliminate TCP port conflicts and cloud networking dependencies Separate metrics collection in bpfman-agent to eliminate TCP port conflicts and cloud networking dependencies May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

1 participant