Skip to content

baremetal-coco: tdx: Add DCAP deployment #493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

fidencio
Copy link
Contributor

@fidencio fidencio commented Dec 18, 2024

Please, see each commit message for a better explanation of what's being done.

This is out for early testing, and there are a few things to consider:

  • We will NOT require any SELinux changes, as I was able to work that requirement around with an init container
  • Tests must be performed on multi-node setup, as so far it's only been tested on single node ones

I will give instructions to @bpradipt, and give him access to a clean SNO cluster, so he can go ahead and give it a try during the time I will be off.

KNOWN ISSUES:

  • Registration on a factory reset machine leads to the following error:
    • From the registration pods logs:
    Error: unexpected error occurred while sending data to cache server.
    Error: the data couldn't be sent to cache server!
    
    Intel(R) Software Guard Extensions PCK Cert ID Retrieval Tool Version 1.22.100.3
    
    Warning: platform manifest is not available or current platform is not multi-package platform.
    
    • From the pccs pod logs:
    2024-12-19 19:03:00.253 [info]: Client Request-ID : 27fdde775ad94d3e8274ed805ab4ba8b
    2024-12-19 19:03:00.257 [info]: Executing (default): SELECT `qe_id`, `pce_id`, `platform_manifest`, `enc_ppid`, `fmspc`, `ca`, `created_time`, `updated_time` FROM `platforms` AS `Platforms` WHERE `Platforms`.`qe_id` = '16D27A5828C08A20BAE67D9EA535B88B' AND `Platforms`.`pce_id` = '0000';
    2024-12-19 19:03:00.582 [info]: Request-ID is : 4b80d2c7d79b4357bda779a72422dc89
    2024-12-19 19:03:00.582 [debug]: Request URL https://api.trustedservices.intel.com/sgx/certification/v4/pckcerts
    2024-12-19 19:03:00.583 [error]: Intel PCS server returns error(404).
    2024-12-19 19:03:00.583 [error]: Error: No cache data for this platform.
      at Module.getPckCertFromPCS (file:///opt/intel/pccs/services/logic/commonCacheLogic.js:159:11)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async LazyCachingMode.registerPlatforms (file:///opt/intel/pccs/services/caching_modes/cachingMode.js:163:7)
      at async Module.registerPlatforms (file:///opt/intel/pccs/services/platformsRegService.js:107:3)
      at async postPlatforms (file:///opt/intel/pccs/controllers/platformsController.js:52:5)
    2024-12-19 19:03:00.586 [info]: 10.128.0.40 - - [19/Dec/2024:19:03:00 +0000] "POST /sgx/certification/v4/platforms HTTP/1.1" 404 32 "-" "-"
    
  • registration pod may start before the pccs pod after a reboot

@openshift-ci openshift-ci bot requested review from bpradipt and snir911 December 18, 2024 21:17
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 18, 2024
Copy link

openshift-ci bot commented Dec 18, 2024

Hi @fidencio. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@fidencio
Copy link
Contributor Author

Please, pretty please, let's have it properly tested before merged. :-)

@bpradipt bpradipt requested a review from lmilleri December 19, 2024 05:41
Copy link

@mythi mythi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestions

@fidencio fidencio force-pushed the topic/baremetal-coco-add-dcap-deployment branch 3 times, most recently from 414eca1 to 986a66b Compare December 19, 2024 12:10
@fidencio fidencio force-pushed the topic/baremetal-coco-add-dcap-deployment branch from 986a66b to 994ffb8 Compare January 2, 2025 16:53
@fidencio fidencio marked this pull request as draft January 6, 2025 08:43
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 6, 2025
@fidencio fidencio force-pushed the topic/baremetal-coco-add-dcap-deployment branch 2 times, most recently from 540b31a to be52ca5 Compare January 7, 2025 15:41
We just missed doing this when we added the deployment of the intel
device plugins operator.

Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio fidencio force-pushed the topic/baremetal-coco-add-dcap-deployment branch 6 times, most recently from aa7df0f to f7c5567 Compare January 9, 2025 19:51
@fidencio fidencio marked this pull request as ready for review January 9, 2025 21:23
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 9, 2025
@openshift-ci openshift-ci bot requested review from gkurz and littlejawa January 9, 2025 21:24
@fidencio
Copy link
Contributor Author

fidencio commented Jan 9, 2025

@bpradipt, this one is finally ready to be reviewed (and hopefully, merged!)

@fidencio fidencio force-pushed the topic/baremetal-coco-add-dcap-deployment branch from f7c5567 to 7d291bf Compare January 10, 2025 19:38
@fidencio fidencio force-pushed the topic/baremetal-coco-add-dcap-deployment branch from f3315a1 to beae6fb Compare January 11, 2025 13:30
As we're relying on `jq` for a few cases, let's just make sure this is
required by moving its check to right after the `oc ` check.

Signed-off-by: Fabiano Fidêncio <[email protected]>
Rename the set_aa_kbc_params_for_kata_agent to
set_kernel_params_for_kata_agent, as later in this series we'll also use
this function to set the agent.https_proxy and agent.no_proxy
parameters, which are needed in order to properly pull images inside the
guest on a cluster running behind proxies.

Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio fidencio force-pushed the topic/baremetal-coco-add-dcap-deployment branch from beae6fb to cd0a112 Compare January 13, 2025 09:40
Let's set up a per-cluster PCCS, meaning:
* a single service that chaces all the DCAP collateral requests
* "indirect" DCAP registration, which stores the encrypted platform keys
  on PCCS
* PCCS database on a dedicated node, where the admin knows what to back
  up (if needed)

This is required in order to have attestation working for TDX, be it
using DCAP directly or ITA.

Signed-off-by: Fabiano Fidêncio <[email protected]>
Signed-off-by: Mikko Ylinen <[email protected]>
@fidencio fidencio force-pushed the topic/baremetal-coco-add-dcap-deployment branch from cd0a112 to 921fe2f Compare January 13, 2025 10:55
This is needed as that's the only way for the agent running inside the
guest to know about proxies.

Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio fidencio force-pushed the topic/baremetal-coco-add-dcap-deployment branch from 921fe2f to 5e82849 Compare January 13, 2025 11:34
oc apply -f ns.yaml || return 1

oc project intel-dcap
oc adm policy add-scc-to-user privileged -z default

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the dcap need to run in a privileged mode?

Copy link
Contributor Author

@fidencio fidencio Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does, unfortunately.

For this, I think we can just leave it as it is for now, as soon enough we should not need to run DCAP / PCCS / QGS from inside a container, with everything being packaged as part of the virt sig.

@fidencio
Copy link
Contributor Author

I started testing this version of the PR, as it is ... let's see how it goes ...

Copy link
Member

@gkurz gkurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks @fidencio !

I started testing this version of the PR, as it is ... let's see how it goes ...

This PR could potentially skip CI since nothing calls this code but I have no control on that AFAIK. I'll thus start CI when you are done with your testing.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 13, 2025
Copy link
Contributor

@bpradipt bpradipt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@fidencio
Copy link
Contributor Author

I started testing this version of the PR, as it is ... let's see how it goes ...

Finished the tests on my side, on an environment that's not behind proxies and it works as expected.
I'd consider this good enough to be merged, and any other adjustment can come later on.

@gkurz gkurz added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jan 13, 2025
Copy link

openshift-ci bot commented Jan 13, 2025

@fidencio: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@gkurz gkurz merged commit dbab600 into openshift:devel Jan 13, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants