Az implementation #4061

Sploder12 · 2025-04-24T16:26:56Z

This PR adds implementations for making VMs unavailable and available. Also has unavailable VMs ignore commands that change anything related to that VM. There are a few known race conditions that are outside the scope of AZs to fix.

MULTI-1789

codecov · 2025-04-24T17:07:29Z

Codecov Report

Attention: Patch coverage is 23.72881% with 45 lines in your changes missing coverage. Please review.

Project coverage is 89.04%. Comparing base (a9be08a) to head (d597d38).

Files with missing lines	Patch %	Lines
src/daemon/daemon.cpp	23.33%	23 Missing ⚠️
.../platform/backends/shared/base_virtual_machine.cpp	13.33%	13 Missing ⚠️
...latform/backends/shared/base_availability_zone.cpp	18.18%	9 Missing ⚠️

Additional details and impacted files

@@                Coverage Diff                 @@
##           az-integration    #4061      +/-   ##
==================================================
- Coverage           89.30%   89.04%   -0.26%     
==================================================
  Files                 264      264              
  Lines               14953    15004      +51     
==================================================
+ Hits                13354    13361       +7     
- Misses               1599     1643      +44

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/daemon/daemon.cpp

# Conflicts: # src/daemon/daemon.cpp

# Conflicts: # tests/test_sshfs_mount_handler.cpp

xmkg · 2025-05-26T13:55:07Z

src/daemon/daemon.cpp

@@ -2028,6 +2030,12 @@ try // clang-format on
            continue;
        }

+        if (vm->current_state() == VirtualMachine::State::unavailable)
+        {
+            mpl::log(mpl::Level::info, name, "Ignoring mount since instance unavailable.");


Hmm... I think we need a better way to represent "unavailability". My main concern is that this approach is error-prone and repetitive. The problem is that "unavailable" instances are still a part of the operative instances, which is wrong. An unavailable instance is not operative since it can't be put into an operative state unless it's made available, so an unavailable instance shouldn't be a part of the "operative" set of VMs in the first place.

Right now, we have two sets of VMs: "operative" and "deleted". I think what we need here is a third category, "unavailable". This way, we can just update the error messages as does not exist or unavailable and call it a day, until we overhaul the VM management logic. @ricab, what do you think?

Hmm good point. I haven't thought a lot about it, but I think I agree with you @xmkg. We need to define also what happens at the intersection of deleted/unavailable i.e., deleted instances that belong to an unavailable AZ. I guess we just declare "deleted" takes precedence over "unavailable"? We need to answer a few questions:

does deleting and instance unsubscribe it from the AZ?

does enabling/disabling an AZ affect deleted instances? (I hope not)

what happens on recover of an instance whose AZ is unavailable?

Hi @ricab, good questions. Regarding the deleted + unavailable at the same time, I think it's reasonable to make deleted take precedence over unavailable. Another avenue might be to consolidate all instances to a single container and represent the "availability" state in a bit flag. This way, an instance can be deleted & unavailable at the same time.

Consolidating all instances into a single container carries a cost: lookup and iteration speed. This can be troublesome for users with many instances. Otherwise, std::ranges filtering would work just fine. If iteration speed matters, another option might be using a multi-index container, which allows for different indices over a set of values instead of just one.

Regarding the last three questions, those should be answered in the AZ spec itself, if not already. @Sploder12 😉

@xmkg I fully agree we need a better way to represent "unavailability", it is known that my current approach is terrible. But, my issue with using operative instances and a new unavailable table is that we'd be moving even more responsibility into the daemon. Also operative instances is the most unsafe thing in our entire code base at the moment. It guarantees race conditions.

does deleting and instance unsubscribe it from the AZ?

Based on spec and implementation, no. Although it'd be nice if this were possible.

does enabling/disabling an AZ affect deleted instances? (I hope not)

Yes, it is necessary for recover in the current implementation :)

what happens on recover of an instance whose AZ is unavailable?

This is unaccounted for by the spec. Implementation recovers the instance in an unavailable state.

Sploder12 marked this pull request as ready for review April 24, 2025 18:22

xmkg reviewed Apr 24, 2025

View reviewed changes

src/daemon/daemon.cpp Outdated Show resolved Hide resolved

Sploder12 force-pushed the az-implementation branch from 3763140 to 6295079 Compare April 25, 2025 04:13

Sploder12 force-pushed the az-integration branch from 80b16de to 8d42803 Compare May 8, 2025 20:35

Sploder12 force-pushed the az-implementation branch 2 times, most recently from d7eb32d to 94e31e3 Compare May 12, 2025 11:17

andrei-toterman and others added 9 commits May 14, 2025 04:29

[daemon] integrate zones in daemon

b8dc6e4

# Conflicts: # src/daemon/daemon.cpp

[tests] make stub az classes and put them where they are needed

2f16816

# Conflicts: # tests/test_sshfs_mount_handler.cpp

[az] inject az manager into vm factory constructors

b524fb5

[tests] make tests compile with azs

2810f81

[az] Implement set_available

e8fd002

[az] Handle unavailable state in daemon

a6d5de2

[az] Handle set_available failures

31991a6

[az] handle merger rebase

e8f846c

[az] fix on_restart tests

d597d38

Sploder12 force-pushed the az-implementation branch from 94e31e3 to d597d38 Compare May 14, 2025 09:09

xmkg reviewed May 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Az implementation #4061

Az implementation #4061

Uh oh!

Sploder12 commented Apr 24, 2025 •

edited

Loading

Uh oh!

codecov bot commented Apr 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

xmkg May 26, 2025

Uh oh!

ricab May 26, 2025

Uh oh!

xmkg May 28, 2025

Uh oh!

Sploder12 May 28, 2025

Uh oh!

Sploder12 May 28, 2025

Uh oh!

Uh oh!

Az implementation #4061

Are you sure you want to change the base?

Az implementation #4061

Uh oh!

Conversation

Sploder12 commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

xmkg May 26, 2025

Choose a reason for hiding this comment

Uh oh!

ricab May 26, 2025

Choose a reason for hiding this comment

Uh oh!

xmkg May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Sploder12 May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Sploder12 May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Sploder12 commented Apr 24, 2025 •

edited

Loading

codecov bot commented Apr 24, 2025 •

edited

Loading