-
Notifications
You must be signed in to change notification settings - Fork 374
[release-4.19] OCPBUGS-56792: Fix CatalogSource image check when unauthorized #6198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-4.19
Are you sure you want to change the base?
Conversation
This PR fixes 3 issues: - Failing and blocking the HostedCluster provisioning when a needed image is unauthorized to be pulled - Overriding the registry once an entry matches just the registry root on the catalogSources - Fallback on the original ImageReference once the registryOverrides does not work as expected. Also includes the test case testing the fallback of a unauthorized pull of an image and the refactor of a test function to be mantible and readable Signed-off-by: Juan Manuel Parrilla Madrid <[email protected]>
@openshift-cherrypick-robot: Detected clone of Jira Issue OCPBUGS-56492 with correct target version. Will retitle the PR to link to the clone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-56792, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/hold Until the z release window is opened |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jparrill, openshift-cherrypick-robot The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
1 similar comment
/retest |
/jira refresh |
/retest |
@jparrill: This pull request references Jira Issue OCPBUGS-56792, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/hold cancel |
/lgtm |
@@ -110,31 +110,45 @@ func imageExistsFn(ctx context.Context, hcp *hyperv1.HostedControlPlane, pullSec | |||
if err == nil { | |||
return true, nil | |||
} | |||
if strings.Contains(err.Error(), "manifest unknown") { | |||
if strings.Contains(err.Error(), "manifest unknown") || strings.Contains(err.Error(), "access to the requested resource is not authorized") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably deserves a comment on why this errors are expectional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new error was the main cause of the initial issue, for some reason the RH registry started to trigger it, even if the auth was correct... I've added both errors to the check, in order to move on and don't block the AWS instance generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My ask is that this should be articulated in a comment in code, so any reader can understand the rationale.
@@ -183,7 +182,10 @@ func (r *RegistryClientImageMetadataProvider) GetDigest(ctx context.Context, ima | |||
case len(composedParsedRef.Tag) > 0: | |||
desc, err := repo.Tags(ctx).Get(ctx, composedParsedRef.Tag) | |||
if err != nil { | |||
return "", nil, err | |||
fmt.Printf("failed to get repository tags for %s composedParsedRef: %+v: %v. Falling back to the original imageRef %s.\n", composedParsedRef.Tag, composedParsedRef, err, imageRef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this raw Printf a left over? This should be using the logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on the logger, I will fix it on a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -183,7 +182,10 @@ func (r *RegistryClientImageMetadataProvider) GetDigest(ctx context.Context, ima | |||
case len(composedParsedRef.Tag) > 0: | |||
desc, err := repo.Tags(ctx).Get(ctx, composedParsedRef.Tag) | |||
if err != nil { | |||
return "", nil, err | |||
fmt.Printf("failed to get repository tags for %s composedParsedRef: %+v: %v. Falling back to the original imageRef %s.\n", composedParsedRef.Tag, composedParsedRef, err, imageRef) | |||
if desc, err = fallbackToOriginalImageRef(ctx, imageRef, pullSecret); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need handling the fallback at all? shouldn't the container runtime handle that as for any icsp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair question. We bake the different OLM catalogs in the MGMT side using services as direct images, that later on, during the HC deployment move to images stored into the OCP internal registry (this only can be checked during the deployment, but if you check the imageStreamTags
you can see what address it has initially (NAME):
NAME IMAGE REFERENCE UPDATED
catalogs:certified-operators image-registry.openshift-image-registry.svc:5000/clusters-jparrill-hosted/catalogs@sha256:e4c426f9729b7680dcfc0d0f1277f8584b45bea6d409718cbe814c706f976e1b 10 hours ago
catalogs:community-operators image-registry.openshift-image-registry.svc:5000/clusters-jparrill-hosted/catalogs@sha256:d2470f2e916d496ec278b06464724c951c0daa7388db78689b1a9f3e102f0526 11 hours ago
catalogs:redhat-marketplace image-registry.openshift-image-registry.svc:5000/clusters-jparrill-hosted/catalogs@sha256:7956cf99adce9563c1ce77079f1160197a38d63d2342c28aa0b5bd9e4065a9bc 46 hours ago
catalogs:redhat-operators image-registry.openshift-image-registry.svc:5000/clusters-jparrill-hosted/catalogs@sha256:2c928b35ad3e00f6c5724db7bd69af3e22e3324c1502dbcfedb0a3e648e9583b 13 hours ago
Once we process the image address of the catalog catalogs:certified-operators
, an underneath process download the catalog image, extracts the catalog content and stores it into the OCP internal registry (E.G image-registry.openshift-image-registry.svc:5000/clusters-jparrill-hosted/catalogs@sha256:2c928b35ad3e00f6c5724db7bd69af3e22e3324c1502dbcfedb0a3e648e9583b
) and this URL is automatically set into the concrete catalog.
The problem was in that underneath process, we asume that the image will be always in the address provided in the IDMS/ICSP and that was not always the case. We added this manual fallback to the metadata image lookup if the controller cannot recover the digest from the overriden image.
@@ -508,3 +510,18 @@ func seekOverride(ctx context.Context, openshiftImageRegistryOverrides map[strin | |||
func buildComposedRef(registry, namespace, name string) string { | |||
return fmt.Sprintf("%s/%s/%s", registry, namespace, name) | |||
} | |||
|
|||
// fallbackToOriginalImageRef tries to get the repository tags for the original imageRef not having in mind the overrides. | |||
func fallbackToOriginalImageRef(ctx context.Context, imageRef string, pullSecret []byte) (distribution.Descriptor, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this func is named fallbackTo... but that's just how you are using it in this concrete invocation right? All is doing is a get call, so an appropriate name should reflect that
/retest |
1 similar comment
/retest-required |
/tide refresh |
/test ci/prow/verify ci/prow/e2e-aws |
@sdodson: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test verify e2e-aws |
@openshift-cherrypick-robot: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This is an automated cherry-pick of #6192
/assign openshift-ci-robot