Skip to content

[release-4.19] OCPBUGS-56792: Fix CatalogSource image check when unauthorized #6198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: release-4.19
Choose a base branch
from

Conversation

openshift-cherrypick-robot

This is an automated cherry-pick of #6192

/assign openshift-ci-robot

This PR fixes 3 issues:

- Failing and blocking the HostedCluster provisioning when a needed
  image is unauthorized to be pulled
- Overriding the registry once an entry matches just the registry root
  on the catalogSources
- Fallback on the original ImageReference once the registryOverrides
  does not work as expected.

Also includes the test case testing the fallback of a unauthorized pull
of an image and the refactor of a test function to be mantible and readable

Signed-off-by: Juan Manuel Parrilla Madrid <[email protected]>
@openshift-ci-robot
Copy link

@openshift-cherrypick-robot: Detected clone of Jira Issue OCPBUGS-56492 with correct target version. Will retitle the PR to link to the clone.
/retitle [release-4.19] OCPBUGS-56792: Fix CatalogSource image check when unauthorized

In response to this:

This is an automated cherry-pick of #6192

/assign openshift-ci-robot

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot changed the title [release-4.19] OCPBUGS-56492: Fix CatalogSource image check when unauthorized [release-4.19] OCPBUGS-56792: Fix CatalogSource image check when unauthorized May 28, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 28, 2025
@openshift-ci-robot
Copy link

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-56792, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is an automated cherry-pick of #6192

/assign openshift-ci-robot

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from hasueki and sjenning May 28, 2025 10:24
@openshift-ci openshift-ci bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels May 28, 2025
@jparrill
Copy link
Contributor

/hold

Until the z release window is opened

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 28, 2025
@jparrill
Copy link
Contributor

/approve

Copy link
Contributor

openshift-ci bot commented May 28, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jparrill, openshift-cherrypick-robot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 28, 2025
@jparrill
Copy link
Contributor

/retest

1 similar comment
@jparrill
Copy link
Contributor

/retest

@jparrill
Copy link
Contributor

/jira refresh

@jparrill
Copy link
Contributor

/retest

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 29, 2025
@openshift-ci-robot
Copy link

@jparrill: This pull request references Jira Issue OCPBUGS-56792, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-56492 is in the state Verified, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Jira Issue OCPBUGS-56492 targets the "4.20.0" version, which is one of the valid target versions: 4.20.0
  • bug has dependents

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jparrill
Copy link
Contributor

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 29, 2025
@mgencur
Copy link
Contributor

mgencur commented May 29, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 29, 2025
@@ -110,31 +110,45 @@ func imageExistsFn(ctx context.Context, hcp *hyperv1.HostedControlPlane, pullSec
if err == nil {
return true, nil
}
if strings.Contains(err.Error(), "manifest unknown") {
if strings.Contains(err.Error(), "manifest unknown") || strings.Contains(err.Error(), "access to the requested resource is not authorized") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably deserves a comment on why this errors are expectional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new error was the main cause of the initial issue, for some reason the RH registry started to trigger it, even if the auth was correct... I've added both errors to the check, in order to move on and don't block the AWS instance generation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My ask is that this should be articulated in a comment in code, so any reader can understand the rationale.

@@ -183,7 +182,10 @@ func (r *RegistryClientImageMetadataProvider) GetDigest(ctx context.Context, ima
case len(composedParsedRef.Tag) > 0:
desc, err := repo.Tags(ctx).Get(ctx, composedParsedRef.Tag)
if err != nil {
return "", nil, err
fmt.Printf("failed to get repository tags for %s composedParsedRef: %+v: %v. Falling back to the original imageRef %s.\n", composedParsedRef.Tag, composedParsedRef, err, imageRef)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this raw Printf a left over? This should be using the logger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the logger, I will fix it on a follow up PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -183,7 +182,10 @@ func (r *RegistryClientImageMetadataProvider) GetDigest(ctx context.Context, ima
case len(composedParsedRef.Tag) > 0:
desc, err := repo.Tags(ctx).Get(ctx, composedParsedRef.Tag)
if err != nil {
return "", nil, err
fmt.Printf("failed to get repository tags for %s composedParsedRef: %+v: %v. Falling back to the original imageRef %s.\n", composedParsedRef.Tag, composedParsedRef, err, imageRef)
if desc, err = fallbackToOriginalImageRef(ctx, imageRef, pullSecret); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need handling the fallback at all? shouldn't the container runtime handle that as for any icsp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair question. We bake the different OLM catalogs in the MGMT side using services as direct images, that later on, during the HC deployment move to images stored into the OCP internal registry (this only can be checked during the deployment, but if you check the imageStreamTags you can see what address it has initially (NAME):

NAME                           IMAGE REFERENCE                                                                                                                                              UPDATED
catalogs:certified-operators   image-registry.openshift-image-registry.svc:5000/clusters-jparrill-hosted/catalogs@sha256:e4c426f9729b7680dcfc0d0f1277f8584b45bea6d409718cbe814c706f976e1b   10 hours ago
catalogs:community-operators   image-registry.openshift-image-registry.svc:5000/clusters-jparrill-hosted/catalogs@sha256:d2470f2e916d496ec278b06464724c951c0daa7388db78689b1a9f3e102f0526   11 hours ago
catalogs:redhat-marketplace    image-registry.openshift-image-registry.svc:5000/clusters-jparrill-hosted/catalogs@sha256:7956cf99adce9563c1ce77079f1160197a38d63d2342c28aa0b5bd9e4065a9bc   46 hours ago
catalogs:redhat-operators      image-registry.openshift-image-registry.svc:5000/clusters-jparrill-hosted/catalogs@sha256:2c928b35ad3e00f6c5724db7bd69af3e22e3324c1502dbcfedb0a3e648e9583b   13 hours ago

Once we process the image address of the catalog catalogs:certified-operators, an underneath process download the catalog image, extracts the catalog content and stores it into the OCP internal registry (E.G image-registry.openshift-image-registry.svc:5000/clusters-jparrill-hosted/catalogs@sha256:2c928b35ad3e00f6c5724db7bd69af3e22e3324c1502dbcfedb0a3e648e9583b) and this URL is automatically set into the concrete catalog.

The problem was in that underneath process, we asume that the image will be always in the address provided in the IDMS/ICSP and that was not always the case. We added this manual fallback to the metadata image lookup if the controller cannot recover the digest from the overriden image.

@@ -508,3 +510,18 @@ func seekOverride(ctx context.Context, openshiftImageRegistryOverrides map[strin
func buildComposedRef(registry, namespace, name string) string {
return fmt.Sprintf("%s/%s/%s", registry, namespace, name)
}

// fallbackToOriginalImageRef tries to get the repository tags for the original imageRef not having in mind the overrides.
func fallbackToOriginalImageRef(ctx context.Context, imageRef string, pullSecret []byte) (distribution.Descriptor, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this func is named fallbackTo... but that's just how you are using it in this concrete invocation right? All is doing is a get call, so an appropriate name should reflect that

@enxebre enxebre added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. labels May 29, 2025
@jparrill
Copy link
Contributor

/retest

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 934a6e1 and 2 for PR HEAD 2b3941e in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 44a177a and 2 for PR HEAD 2b3941e in total

1 similar comment
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 44a177a and 2 for PR HEAD 2b3941e in total

@celebdor
Copy link
Collaborator

/retest-required

@sdodson
Copy link
Member

sdodson commented May 30, 2025

/tide refresh

@sdodson
Copy link
Member

sdodson commented May 30, 2025

/test ci/prow/verify ci/prow/e2e-aws
verify job is running for 15 hours, hopefully this terminates and starts a fresh one?

Copy link
Contributor

openshift-ci bot commented May 30, 2025

@sdodson: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-aks
/test e2e-aws
/test e2e-aws-4-18
/test e2e-aws-override
/test e2e-aws-upgrade-hypershift-operator
/test e2e-kubevirt-aws-ovn-reduced
/test images
/test mce-images
/test security
/test unit
/test verify
/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-aws-karpenter-core
/test e2e-aws-metrics
/test e2e-aws-minimal
/test e2e-aws-techpreview
/test e2e-azure-aks-ovn-conformance
/test e2e-conformance
/test e2e-kubevirt-aws-ovn
/test e2e-kubevirt-azure-ovn
/test e2e-kubevirt-metal-conformance
/test e2e-openstack-aws
/test e2e-openstack-aws-conformance
/test e2e-openstack-aws-csi-cinder
/test e2e-openstack-aws-csi-manila
/test e2e-openstack-aws-nfv
/test okd-scos-e2e-aws-ovn
/test okd-scos-images

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-hypershift-release-4.19-e2e-aks
pull-ci-openshift-hypershift-release-4.19-e2e-aws
pull-ci-openshift-hypershift-release-4.19-e2e-aws-upgrade-hypershift-operator
pull-ci-openshift-hypershift-release-4.19-e2e-kubevirt-aws-ovn-reduced
pull-ci-openshift-hypershift-release-4.19-images
pull-ci-openshift-hypershift-release-4.19-mce-images
pull-ci-openshift-hypershift-release-4.19-okd-scos-e2e-aws-ovn
pull-ci-openshift-hypershift-release-4.19-security
pull-ci-openshift-hypershift-release-4.19-unit
pull-ci-openshift-hypershift-release-4.19-verify
pull-ci-openshift-hypershift-release-4.19-verify-deps

In response to this:

/test ci/prow/verify ci/prow/e2e-aws
verify job is running for 15 hours, hopefully this terminates and starts a fresh one?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sdodson
Copy link
Member

sdodson commented May 30, 2025

/test verify e2e-aws

Copy link
Contributor

openshift-ci bot commented May 30, 2025

@openshift-cherrypick-robot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws 2b3941e link true /test e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 44a177a and 2 for PR HEAD 2b3941e in total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants