-
Notifications
You must be signed in to change notification settings - Fork 19
fix: get the right plan if there were several attempts #607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Michael Todorovic <[email protected]>
During your debugging, did you figure out why the initial behavior doesn't reliably work ? Calling the datastore with an empty string as |
@LucasMrqes @michael-todorovic I just checked the datastore code, when burrito/internal/datastore/storage/common.go Lines 123 to 127 in 847331e
A bug might be in this function 🤔 |
Indeed, it seems like the |
Yes exactly! And "number of keys" is not "number of attempts" because with 2 attempts the number of keys may be more. Example on S3 with first attempt that fails:
Here S3 will return 4 keys for 2 attempts, which ultimately results in the wrong subsequent call to S3: Burrito will try to fetch the last plan short diff of attempt 3 (keys-1)! |
Signed-off-by: Michael Todorovic <[email protected]>
Signed-off-by: Michael Todorovic <[email protected]>
Actually,
I changed a bit the logic to get and address directly the right latest attempt |
Is it not recursive across all backend implementations (GCS, Azure & S3)? |
You're right, Azure looks to be recursive by default, gcs isn't because of |
Signed-off-by: Michael Todorovic <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #607 +/- ##
==========================================
+ Coverage 44.93% 45.13% +0.20%
==========================================
Files 79 79
Lines 5759 5780 +21
==========================================
+ Hits 2588 2609 +21
Misses 2956 2956
Partials 215 215 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
internal/datastore/storage/common.go
Outdated
// Azure returns the full path, so we need to split by "/" | ||
attemptId := strings.Split(attemptStr, "/")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making custom code for Azure here, why not changing the List
implementation of the Azure backend?
I've checked GCS and S3 implementations, they seem to build a list of prefixes whereas the Azure one (see below) seems to build a list of filenames 🤔
burrito/internal/datastore/storage/azure/azure.go
Lines 101 to 119 in 847331e
func (a *Azure) List(prefix string) ([]string, error) { | |
keys := []string{} | |
marker := "" | |
pager := a.Client.NewListBlobsFlatPager(a.Config.Container, &container.ListBlobsFlatOptions{ | |
Prefix: &prefix, | |
Marker: &marker, | |
}) | |
for pager.More() { | |
resp, err := pager.NextPage(context.TODO()) | |
if err != nil { | |
return nil, err | |
} | |
for _, blob := range resp.Segment.BlobItems { | |
keys = append(keys, *blob.Name) | |
} | |
} | |
return keys, nil | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I preferred a systemic fix for all buckets types to make sure we get the right stuff. For example, GCS could list recursively just like Azure if this line gets removed (during a future refacto)
burrito/internal/datastore/storage/gcs/gcs.go
Line 123 in 847331e
Delimiter: "/", |
I found it safer to deal with all cases in a single location and still avoid a regex to extract the attempt id :) Maybe the comment is misleading though and could be adjusted, which way would you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I agree on the safeness of this systematic fix. The comment is indeed misleading and should be generic yes 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be better now.
I also changed GetAttempts
to use []int{}
internally so we can sort as integers (so 10 comes after 2) and deduplicate
Signed-off-by: Michael Todorovic <[email protected]>
I see the issue thx @michael-todorovic to have found it, the function just made a big hypothesis that all attempts would be stored in the datastore but we can't rely on that however I feel we need to put less logic in the generic function that interacts with the backend and make providers compliant with the output we want to obtain from a List function |
@corrieriluca suggested it as well, I can adjust accordingly |
If that's ok for you, even if it's a bit of extra work, I'd advocate for that in favor of keeping "sanitizing" logic in the generic GetAttempts. This makes the GetAttempts function more easily testable and delegate bugs to the underlying provider implementation. |
Sure, we saw in #602 that it could be useful as well 😄
|
This PR fixes #606
I created 1000
random_pet
layers, added them all at once to stress the system and observed the behavior.To force retries, I randomly crashed running pods on-purpose:
So I got my retries as expected
I checked datastore logs and saw that now, reconciliation autocorrects the
attempt
, thus thestatus
changes from 404 to 200 😄