Skip to content

Prometheus example can't get metrics from the router instance #17685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
simonpasquier opened this issue Dec 8, 2017 · 15 comments
Closed

Prometheus example can't get metrics from the router instance #17685

simonpasquier opened this issue Dec 8, 2017 · 15 comments

Comments

@simonpasquier
Copy link
Contributor

I've deployed the Prometheus application from the examples directory and I've noticed that Prometheus discovers the router instance but scraping fails.

Version

oc v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
openshift v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62

Steps To Reproduce
  1. Instantiate the template from https://github.com/openshift/origin/tree/master/examples/prometheus
  2. Once ready, open the Prometheus UI and go to the Targets page.
Current Result

The router instance is reported as DOWN in the kubernetes-service-endpoints job. The associated error is server returned HTTP status 500 Internal Server Error.

Expected Result

The router instance is UP and reports metrics in Prometheus.

Additional Information

image

@simonpasquier
Copy link
Contributor Author

AFAIU this relates to prometheus/prometheus#2614. The 500 error is what the router returns for unauthenticated requests (tested manually from a browser). The Kubernetes service discovery can't use relabeling to override the authentication parameters used during scraping as the example seems to imply.

@simonpasquier
Copy link
Contributor Author

@pgier noted that there is a typo in the example template. The relabeling section expects the annotations on the router service to be prometheus.io/username and prometheus.io/password while they are named prometheus.openshift.io/username and prometheus.openshift.io/password.

Prometheus config

        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_username]
          action: replace
          target_label: __basic_auth_username__
          regex: (.+)
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_password]
          action: replace
          target_label: __basic_auth_password__
          regex: (.+)

Router service

oc describe svc/router -n default
Name:                   router
Namespace:              default
Labels:                 router=router
Annotations:            prometheus.io/port=1936
                        prometheus.io/scrape=true
                        prometheus.openshift.io/password=MYbhzz3rYb
                        prometheus.openshift.io/username=admin
Selector:               router=router
Type:                   ClusterIP
IP:                     172.30.78.175
Port:                   80-tcp  80/TCP
Endpoints:              10.0.2.15:80
Port:                   443-tcp 443/TCP
Endpoints:              10.0.2.15:443
Port:                   1936-tcp        1936/TCP
Endpoints:              10.0.2.15:1936
Session Affinity:       None
Events:                 <none>

Nevertheless I've fixed the typo in the relabel_configs section on my local environment and it doesn't change the outcome since as noted previously the Prometheus service discovery can't override the authentication parameters via relabeling.

@simonpasquier
Copy link
Contributor Author

I've also confirmed that when I configure an additional scrape job for the router that uses the annotated credentials, Prometheus can pull the metrics successfully.

@jameseck
Copy link

I'm struggling with this exact problem. From the research I've done, it doesn't appear that Prometheus actually supports basic auth annotations in any form at all.
Can anyone confirm if this is the case?
For anyone who has this working, could they please confirm the versions of Openshift and Prometheus they are using? I'm using Openshift 3.6.1 and Prometheus 2.1.0.

@ibotty
Copy link
Contributor

ibotty commented Jan 22, 2018

It does not work with any upstream release. See prometheus/prometheus#1176.

@ghost
Copy link

ghost commented Feb 17, 2018

Since this does not appear to work, why is it even in the config then? This is very confusing as I have now spent a lot of time thinking that it was me that was doing something wrong, when in reality prometheus doesn't even support this option. It would be nice if it did, so don't get me wrong. But again its just very misleading and should be removed.

@caruccio
Copy link

caruccio commented Apr 5, 2018

Maybe, as a temporary hac^^^fix you could remove basic auth from haproxy 1936 port while allowing only infra nodes to connect locally to this port.

@jkroepke
Copy link

jkroepke commented Apr 9, 2018

It does not work with any upstream release. See prometheus/prometheus#1176.

@ibotty Are the patches public?

@ibotty
Copy link
Contributor

ibotty commented Apr 9, 2018

I don't know of any (public or not) patches.

@jkroepke
Copy link

jkroepke commented Apr 9, 2018

So it won't work with Red Hats prometheus, too?

@simonpasquier
Copy link
Contributor Author

It may have been fixed in the master branch (I'm currently checking).

@simonpasquier
Copy link
Contributor Author

Things have improved a bit on master but it is not quite there yet (at least when using oc cluster up, not sure for other deployment methods):

image

From what I can tell, #18254 isn't sufficient because while Prometheus authenticates with the prometheus-scraper token against the router's metrics endpoint, the default:router service account doesn't have the correct permissions to validate tokens.

With increased router's log level:

I0410 13:46:04.072140       1 metrics.go:70] Unable to authenticate: tokenreviews.authentication.k8s.io is forbidden: User "system:serviceaccount:default:router" cannot create tokenreviews.authentication.k8s.io at the cluster scope: User "system:serviceaccount:default:router" cannot create tokenreviews.authentication.k8s.io at the cluster scope
I0410 13:47:04.061341       1 metrics.go:70] Unable to authenticate: tokenreviews.authentication.k8s.io is forbidden: User "system:serviceaccount:default:router" cannot create tokenreviews.authentication.k8s.io at the cluster scope: User "system:serviceaccount:default:router" cannot create tokenreviews.authentication.k8s.io at the cluster scope
I0410 13:48:04.072103       1 metrics.go:70] Unable to authenticate: tokenreviews.authentication.k8s.io is forbidden: User "system:serviceaccount:default:router" cannot create tokenreviews.authentication.k8s.io at the cluster scope: User "system:serviceaccount:default:router" cannot create tokenreviews.authentication.k8s.io at the cluster scope

As a workaround, I've added system:auth-delegator to the router SA:

oc adm policy add-cluster-role-to-user system:auth-delegator -z router -n default

After this, Prometheus was able to scrape the target correctly:

image

I'm currently testing a change locally and will send a PR if it is successful.

@jkroepke
Copy link

@simonpasquier I'm looking for the patches that support the prometheus.openshift.io/username and prometheus.openshift.io/password things.

Also see: https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/openshift_prometheus/templates/prometheus.yml.j2#L152
and https://docs.openshift.com/container-platform/3.9/install_config/router/default_haproxy_router.html#exposing-the-router-metrics

Does it work with auth basic? Or are the annotations for nothing?

I'm asking for a prometheus deployment which live outside the cluster.

@simonpasquier
Copy link
Contributor Author

@jkroepke IIRC the patches were included in the openshift/prometheus image at some point but not anymore.

Does it work with auth basic? Or are the annotations for nothing?

The annotations are useless from my POV because they can't be interpreted by Prometheus.

@simonpasquier
Copy link
Contributor Author

Closed by #19318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants