-
Notifications
You must be signed in to change notification settings - Fork 692
Az implementation #4061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: az-integration
Are you sure you want to change the base?
Az implementation #4061
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## az-integration #4061 +/- ##
==================================================
- Coverage 89.30% 89.04% -0.26%
==================================================
Files 264 264
Lines 14953 15004 +51
==================================================
+ Hits 13354 13361 +7
- Misses 1599 1643 +44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3763140
to
6295079
Compare
d7eb32d
to
94e31e3
Compare
# Conflicts: # src/daemon/daemon.cpp
# Conflicts: # tests/test_sshfs_mount_handler.cpp
94e31e3
to
d597d38
Compare
@@ -2028,6 +2030,12 @@ try // clang-format on | |||
continue; | |||
} | |||
|
|||
if (vm->current_state() == VirtualMachine::State::unavailable) | |||
{ | |||
mpl::log(mpl::Level::info, name, "Ignoring mount since instance unavailable."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I think we need a better way to represent "unavailability". My main concern is that this approach is error-prone and repetitive. The problem is that "unavailable" instances are still a part of the operative instances, which is wrong. An unavailable instance is not operative since it can't be put into an operative state unless it's made available, so an unavailable instance shouldn't be a part of the "operative" set of VMs in the first place.
Right now, we have two sets of VMs: "operative" and "deleted". I think what we need here is a third category, "unavailable". This way, we can just update the error messages as does not exist or unavailable
and call it a day, until we overhaul the VM management logic. @ricab, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good point. I haven't thought a lot about it, but I think I agree with you @xmkg. We need to define also what happens at the intersection of deleted/unavailable i.e., deleted instances that belong to an unavailable AZ. I guess we just declare "deleted" takes precedence over "unavailable"? We need to answer a few questions:
- does deleting and instance unsubscribe it from the AZ?
- does enabling/disabling an AZ affect deleted instances? (I hope not)
- what happens on recover of an instance whose AZ is unavailable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ricab, good questions. Regarding the deleted + unavailable at the same time, I think it's reasonable to make deleted take precedence over unavailable. Another avenue might be to consolidate all instances to a single container and represent the "availability" state in a bit flag. This way, an instance can be deleted & unavailable at the same time.
Consolidating all instances into a single container carries a cost: lookup and iteration speed. This can be troublesome for users with many instances. Otherwise, std::ranges filtering would work just fine. If iteration speed matters, another option might be using a multi-index container, which allows for different indices over a set of values instead of just one.
Regarding the last three questions, those should be answered in the AZ spec itself, if not already. @Sploder12 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xmkg I fully agree we need a better way to represent "unavailability", it is known that my current approach is terrible. But, my issue with using operative instances and a new unavailable table is that we'd be moving even more responsibility into the daemon. Also operative instances is the most unsafe thing in our entire code base at the moment. It guarantees race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does deleting and instance unsubscribe it from the AZ?
Based on spec and implementation, no. Although it'd be nice if this were possible.
does enabling/disabling an AZ affect deleted instances? (I hope not)
Yes, it is necessary for recover in the current implementation :)
what happens on recover of an instance whose AZ is unavailable?
This is unaccounted for by the spec. Implementation recovers the instance in an unavailable state.
This PR adds implementations for making VMs unavailable and available. Also has unavailable VMs ignore commands that change anything related to that VM. There are a few known race conditions that are outside the scope of AZs to fix.
MULTI-1789