-
Notifications
You must be signed in to change notification settings - Fork 27
Add initialDelay #303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initialDelay #303
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this proposed!
A few suggestions inline along with a recommendation to make the feature non-blocking.
@@ -44,6 +44,11 @@ spec: | |||
default: 900 | |||
description: Time between individual aide scans | |||
type: integer | |||
initialDelay: | |||
description: InitalDelaySeconds is the number of seconds to wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: InitialDelaySeconds
"first scan. It"
} else { | ||
// sleep for the initial delay | ||
reqLogger.Info("InitialDelaySeconds set, sleeping", "InitialDelaySeconds", instance.Spec.Config.InitialDelay) | ||
time.Sleep(time.Duration(instance.Spec.Config.InitialDelay) * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my testing - this is a blocking operation and could to confusion with multiple file integrity resources.
I tested this by creating two different scans:
apiVersion: fileintegrity.openshift.io/v1alpha1
kind: FileIntegrity
metadata:
name: master-fileintegrity
namespace: openshift-file-integrity
spec:
nodeSelector:
node-role.kubernetes.io/master: ""
config:
name: master-aide-conf
namespace: openshift-file-integrity
initialDelay: 120
$ cat worker-fio.yaml
apiVersion: fileintegrity.openshift.io/v1alpha1
kind: FileIntegrity
metadata:
name: worker-fileintegrity
namespace: openshift-file-integrity
spec:
nodeSelector:
node-role.kubernetes.io/worker: ""
config:
name: worker-aide-conf
namespace: openshift-file-integrity
initialDelay: 120
I created them at the same time. I expect that the daemonSets for each would be available at the same time (since they use the same initial delay and I created them around the same time).
What I observed is that the timer for the second file integrity starts after the first is created because the reconcile loop is sleeping.
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/aide-master-fileintegrity 3 3 3 3 3 node-role.kubernetes.io/master= 2m6s
daemonset.apps/aide-worker-fileintegrity 2 2 2 2 2 node-role.kubernetes.io/worker= 6s
Is it possible to re-queue the request and determine this based on the request timestamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the following to make this a non-blocking operation.
$ git d pkg/controller/
diff --git a/pkg/controller/fileintegrity/fileintegrity_controller.go b/pkg/controller/fileintegrity/fileintegrity_controller.go
index c18a49d8..37b7cb5a 100644
--- a/pkg/controller/fileintegrity/fileintegrity_controller.go
+++ b/pkg/controller/fileintegrity/fileintegrity_controller.go
@@ -440,13 +440,16 @@ func (r *FileIntegrityReconciler) FileIntegrityControllerReconcile(request recon
return reconcile.Result{}, legacyDeleteErr
}
- // check if we have initialDelay set
- if instance.Spec.Config.InitialDelay == 0 {
- reqLogger.Info("InitialDelaySeconds not set, creating deamonset now")
- } else {
- // sleep for the initial delay
- reqLogger.Info("InitialDelaySeconds set, sleeping", "InitialDelaySeconds", instance.Spec.Config.InitialDelay)
- time.Sleep(time.Duration(instance.Spec.Config.InitialDelay) * time.Second)
+ // Check if we're past the initial delay timer by evaluating
+ // the time since creation.
+ n := time.Now()
+ d := n.Sub(instance.GetCreationTimestamp().Time)
+
+ if d.Seconds() < float64(instance.Spec.Config.InitialDelay) {
+ s := fmt.Sprintf("Re-queuing request for %s because elapsed time since creation (%f seconds) hasn't exceeded InitialDelay of %d seconds", instance.Name, d.Seconds(), instance.Spec.Config.InitialDelay)
+ reqLogger.Info(s)
+ // requeue the request
+ return reconcile.Result{Requeue: true}, nil
}
reqLogger.Info("Creating daemonSet", "DaemonSet", daemonSetName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good catch, thanks for it, this works now!
Thanks for the throughout review, let me address those issues |
3f7207d
to
ec0406c
Compare
// the time since creation. | ||
d := time.Since(instance.CreationTimestamp.Time) | ||
|
||
if d.Seconds() < float64(instance.Spec.Config.InitialDelay) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more idiomatic to try to use the time.Before()
or time.After
methods.
Something like:
shouldScheduleAt := instance.CreationTimestamp.Time.Add(instance.Spec.Config.InitialDelay)
if time.Now().Before(shouldScheduleAt) {
// reconcile
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, this makes sense, just updated the code
Have success e2e run on my local cluster, it looks like we have some issues with must gather |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes look good. Just have to figure out the e2e issues.
/retest |
1 similar comment
/retest |
5381fe2
to
662a92b
Compare
We added initalDelay option to FileIntegrity CRD to allow users to specify the initial delay before the first scan is run. This is useful for environments where the operator is deployed before cluster is fully ready.
/retest |
Verification pass with 4.13.0-0.nightly-2023-02-23-000625 and code in the PR:
|
/label qe-app-approved |
@xiaojiey: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Vincent056: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks, Vincent!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rhmdnd, Vincent056 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
We added
initialDelay
option to FileIntegrity CRD to allow users to specify the initial delay before the first scan is run. This is useful for environments where the operator is deployed before the cluster is fully ready.Launch of aide demonset will be delayed according to the value of initialDelay set in FileIntegrity Object.
To launch a FileIntegrity Check for worker nodes with a delay of 100s you can create the following object: