Skip to content

Replace kube-rbac-proxy with controller-runtime's built-in TLS capabilities #437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 1, 2025

Conversation

frobware
Copy link
Contributor

@frobware frobware commented Apr 30, 2025

This PR removes the kube-rbac-proxy sidecar container and implements native TLS support for metrics endpoints.

Generic changes

  • Remove kube-rbac-proxy container and related configurations
  • Implement controller-runtime's certificate handling with rotation support
  • Configure TLS-based metrics endpoints on port 8443
  • Update generic RBAC controls for metrics authentication/authorization
  • Certificates are mounted at /tmp/k8s-webhook-server/serving-certs with the controller-runtime watching for changes
  • Metrics can now be scraped securely from both the agent and controller components

Generic RBAC Changes

  • auth_delegator_binding: Grants access to the system:auth-delegator ClusterRole, allowing our services to delegate authentication decisions to the Kubernetes API server. This binding is necessary because we've added FilterProvider: filters.WithAuthenticationAndAuthorization to the metrics server, which requires API server delegation for authentication validation.
  • metrics_reader_role: Provides minimal permissions (GET access to /metrics non-resource URL) for monitoring systems to scrape metrics. Follows principle of least privilege.

OpenShift-specific Changes

  • Service CA Integration: Leverage OpenShift's Service CA to automatically generate and manage certificates through service annotations (service.beta.openshift.io/serving-cert-secret-name)
  • TLS Verification: Enabled strict certificate verification (insecureSkipVerify: false) with proper CA paths and server name validation
  • ServiceMonitor Configuration: Updated to use the OpenShift CA path (/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt)
  • prometheus-k8s Role/RoleBinding: Added specifically to allow the OpenShift monitoring stack to access bfpman's services and endpoints

@frobware frobware force-pushed the drop-kube-rbac-proxy branch from 0b9c314 to c3b8925 Compare April 30, 2025 12:58
@frobware
Copy link
Contributor Author

Test Script to verify bpfman-{agent, operator} metrics are being scraped
#!/usr/bin/env bash

set -euo pipefail

echo "=== Controller-manager ServiceMonitor endpoint ==="
oc get servicemonitor bpfman-controller-manager-metrics-monitor \
  -n bpfman -o yaml \
  | yq e '.spec.endpoints[0]' -

echo
echo "=== Agent ServiceMonitor endpoint ==="
oc get servicemonitor bpfman-agent-metrics-monitor \
  -n bpfman -o yaml \
  | yq e '.spec.endpoints[0]' -

echo
echo "=== Prometheus controller targets (bpfman namespace) ==="
oc exec -n openshift-monitoring prometheus-k8s-0 -c prometheus -- \
  curl -s 'http://localhost:9090/api/v1/targets?state=active' \
  | jq '
    .data.activeTargets[]
    | select(.discoveredLabels["__meta_kubernetes_namespace"] == "bpfman")
    | select(.scrapePool | test("controller-manager-metrics-monitor"))
    | { job: .scrapePool, url: .scrapeUrl, health: .health }
  '

echo
echo "=== Prometheus agent targets (bpfman namespace) ==="
oc exec -n openshift-monitoring prometheus-k8s-0 -c prometheus -- \
  curl -s 'http://localhost:9090/api/v1/targets?state=active' \
  | jq '
    .data.activeTargets[]
    | select(.discoveredLabels["__meta_kubernetes_namespace"] == "bpfman")
    | select(.scrapePool | test("agent-metrics-monitor"))
    | { job: .scrapePool, url: .scrapeUrl, health: .health }
  '

@frobware
Copy link
Contributor Author

% ~/verify-bpfman-metrics.sh
=== Controller-manager ServiceMonitor endpoint ===
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
path: /metrics
port: https-metrics
scheme: https
tlsConfig:
  caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
  insecureSkipVerify: false
  serverName: bpfman-controller-manager-metrics-service.bpfman.svc

=== Agent ServiceMonitor endpoint ===
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
path: /metrics
port: https-metrics
scheme: https
tlsConfig:
  caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
  insecureSkipVerify: false
  serverName: bpfman-agent-metrics-service.bpfman.svc

=== Prometheus controller targets (bpfman namespace) ===
{
  "job": "serviceMonitor/bpfman/bpfman-controller-manager-metrics-monitor/0",
  "url": "https://10.130.0.26:8443/metrics",
  "health": "up"
}

=== Prometheus agent targets (bpfman namespace) ===
{
  "job": "serviceMonitor/bpfman/bpfman-agent-metrics-monitor/0",
  "url": "https://192.168.7.108:8443/metrics",
  "health": "up"
}
{
  "job": "serviceMonitor/bpfman/bpfman-agent-metrics-monitor/0",
  "url": "https://192.168.7.109:8443/metrics",
  "health": "up"
}
{
  "job": "serviceMonitor/bpfman/bpfman-agent-metrics-monitor/0",
  "url": "https://192.168.7.110:8443/metrics",
  "health": "up"
}

@frobware
Copy link
Contributor Author

image

@frobware frobware closed this Apr 30, 2025
@frobware frobware reopened this Apr 30, 2025
Copy link

codecov bot commented Apr 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (4e2b3e7) to head (0b6fa45).
Report is 6 commits behind head on main.

Additional details and impacted files
@@     Coverage Diff     @@
##   main   #437   +/-   ##
===========================
===========================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@frobware frobware changed the title Drop kube-rbac-proxy Replace kube-rbac-proxy with controller-runtime's built-in TLS capabilities May 1, 2025
Copy link
Contributor

mergify bot commented May 1, 2025

@frobware, this pull request is now in conflict and requires a rebase.

@mergify mergify bot added the needs-rebase label May 1, 2025
frobware added 3 commits May 1, 2025 09:46
This commit removes the kube-rbac-proxy sidecar container that was
previously used to secure metrics. Metrics are temporarily disabled by
setting metrics-bind-address=0, which prevents the metrics server from
starting (as per controller-runtime implementation).

Future commits will reintroduce metrics with native TLS support from
controller-runtime, following the approach recommended in
kubernetes-sigs/kubebuilder#3907.

Removed kube-rbac-proxy component labels from metrics service
definition and ClusterRole. The app.kubernetes.io/component label will
be updated to 'metrics' in a subsequent commit that implements
TLS-enabled metrics with controller-runtime's built-in capabilities.

Signed-off-by: Andrew McDermott <[email protected]>
Add Go dependencies required for upcoming TLS metrics implementation:
- sigs.k8s.io/controller-runtime/pkg/certwatcher
- sigs.k8s.io/controller-runtime/pkg/metrics/filters

This commit includes only the vendored dependencies via:

1. go get sigs.k8s.io/controller-runtime/pkg/certwatcher
2. go get sigs.k8s.io/controller-runtime/pkg/metrics/filters
3. go mod tidy
4. go mod vendor

Upcoming changes will use these dependencies to implement TLS-based
metrics in the operator without requiring kube-rbac-proxy.

Signed-off-by: Andrew McDermott <[email protected]>
Implement controller-runtime's built-in TLS support for secure
metrics, following the earlier removal of kube-rbac-proxy:

1. Updated main.go with controller-runtime's TLS capabilities:
   - Added certificate watcher for rotation
   - Support for self-signed and provided certificates
   - RBAC authentication and authorization

2. Added RBAC resources:
   - metrics_reader_role with GET /metrics permission
   - auth_delegator_binding for authentication
   - metrics_reader_rolebinding for service accounts

3. Updated manifests:
   - Metrics service on port 8443
   - Updated metrics-bind-address configuration
   - Fixed namespace configuration

The implementation generates self-signed certificates when needed and
follows Kubebuilder's recommended approach without requiring a sidecar.

Signed-off-by: Andrew McDermott <[email protected]>
@frobware frobware force-pushed the drop-kube-rbac-proxy branch from c3b8925 to 963f233 Compare May 1, 2025 10:03
@mergify mergify bot removed the needs-rebase label May 1, 2025
@frobware frobware force-pushed the drop-kube-rbac-proxy branch 2 times, most recently from fdee32d to 2233100 Compare May 1, 2025 11:58
@frobware
Copy link
Contributor Author

frobware commented May 1, 2025

Testing cert rotation:

% oc logs -n bpfman bpfman-operator-74dbc7ddd8-hcskm
Default container name "manager" not found in pod bpfman-operator-74dbc7ddd8-hcskm
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"setup","msg":"metricsAddr","metricsAddr":":8443"}
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"setup","msg":"Certificate watcher configured for metrics TLS","certPath":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"setup","msg":"Discovering APIs"}
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"setup","msg":"detected platform version","PlatformVersion":"v1.32.3"}
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"setup","msg":"route.openshift.io found in apis, platform is OpenShift"}
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"starting server","name":"health probe","addr":"[::]:8175"}
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8443","secure":true}
I0501 12:19:26.648819       1 leaderelection.go:257] attempting to acquire leader lease bpfman/8730d955.bpfman.io...
I0501 12:19:26.658727       1 leaderelection.go:271] successfully acquired lease bpfman/8730d955.bpfman.io
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"controller-runtime.certwatcher","msg":"Starting certificate poll+watcher","interval":10}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting EventSource","controller":"clusterbpfapplication","controllerGroup":"bpfman.io","controllerKind":"ClusterBpfApplication","source":"kind source: *v1alpha1.ClusterBpfApplicationState"}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting EventSource","controller":"configmap","controllerGroup":"","controllerKind":"ConfigMap","source":"kind source: *v1.DaemonSet"}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting EventSource","controller":"configmap","controllerGroup":"","controllerKind":"ConfigMap","source":"kind source: *v1.ConfigMap"}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting EventSource","controller":"bpfapplication","controllerGroup":"bpfman.io","controllerKind":"BpfApplication","source":"kind source: *v1alpha1.BpfApplicationState"}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting EventSource","controller":"clusterbpfapplication","controllerGroup":"bpfman.io","controllerKind":"ClusterBpfApplication","source":"kind source: *v1alpha1.ClusterBpfApplication"}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting EventSource","controller":"bpfapplication","controllerGroup":"bpfman.io","controllerKind":"BpfApplication","source":"kind source: *v1alpha1.BpfApplication"}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting Controller","controller":"configmap","controllerGroup":"","controllerKind":"ConfigMap"}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting workers","controller":"configmap","controllerGroup":"","controllerKind":"ConfigMap","worker count":1}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting Controller","controller":"bpfapplication","controllerGroup":"bpfman.io","controllerKind":"BpfApplication"}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting workers","controller":"bpfapplication","controllerGroup":"bpfman.io","controllerKind":"BpfApplication","worker count":1}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting Controller","controller":"clusterbpfapplication","controllerGroup":"bpfman.io","controllerKind":"ClusterBpfApplication"}
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting workers","controller":"clusterbpfapplication","controllerGroup":"bpfman.io","controllerKind":"ClusterBpfApplication","worker count":1}
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"configMap","msg":"Creating Bpfman Daemon"}
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"configMap","msg":"Reconciling bpfman"}

Then delete the TLS secret:

% oc get secrets -n bpfman
NAME                              TYPE                      DATA   AGE
agent-metrics-tls                 kubernetes.io/tls         2      2m39s
bpfman-daemon-dockercfg-kbbbv     kubernetes.io/dockercfg   1      2m47s
bpfman-operator-dockercfg-ckd72   kubernetes.io/dockercfg   1      2m46s
builder-dockercfg-f7269           kubernetes.io/dockercfg   1      2m50s
controller-manager-metrics-tls    kubernetes.io/tls         2      2m38s
default-dockercfg-jnv89           kubernetes.io/dockercfg   1      2m50s
deployer-dockercfg-fn74p          kubernetes.io/dockercfg   1      2m50s
% oc delete secrets -n bpfman controller-manager-metrics-tls
secret "controller-manager-metrics-tls" deleted

And notice the cert-watcher inject a new TLS cert:


% oc logs -n bpfman bpfman-operator-74dbc7ddd8-hcskm -f
...
{"level":"info","ts":"2025-05-01T12:19:26Z","msg":"Starting 
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"configMap","msg":"Creating Bpfman Daemon"}
{"level":"info","ts":"2025-05-01T12:19:26Z","logger":"configMap","msg":"Reconciling bpfman"}
...

{"level":"info","ts":"2025-05-01T12:23:35Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}

Signed-off-by: Andrew McDermott <[email protected]>
Copy link
Contributor

@dave-tucker dave-tucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mergify mergify bot merged commit 2f9d586 into bpfman:main May 1, 2025
11 checks passed
frobware pushed a commit to frobware/bpfman-operator that referenced this pull request May 6, 2025
…s/component-update-ocp-bpfman-operator-bundle

chore(deps): update ocp-bpfman-operator-bundle to f56e565
@frobware frobware deleted the drop-kube-rbac-proxy branch May 16, 2025 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants