Skip to content

Az implementation #4061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: az-integration
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions include/multipass/virtual_machine.h
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ class VirtualMachine : private DisabledCopyMove

protected:
const QDir instance_dir;
bool was_running{false};

VirtualMachine(VirtualMachine::State state, const std::string& vm_name, const Path& instance_dir)
: state{state}, vm_name{vm_name}, instance_dir{QDir{instance_dir}} {};
Expand Down
54 changes: 53 additions & 1 deletion src/daemon/daemon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -994,6 +994,8 @@
return mp::InstanceStatus::SUSPENDING;
case mp::VirtualMachine::State::suspended:
return mp::InstanceStatus::SUSPENDED;
case mp::VirtualMachine::State::unavailable:
return mp::InstanceStatus::UNAVAILABLE;

Check warning on line 998 in src/daemon/daemon.cpp

View check run for this annotation

Codecov / codecov/patch

src/daemon/daemon.cpp#L997-L998

Added lines #L997 - L998 were not covered by tests
case mp::VirtualMachine::State::unknown:
default:
return mp::InstanceStatus::UNKNOWN;
Expand Down Expand Up @@ -2028,6 +2030,12 @@
continue;
}

if (vm->current_state() == VirtualMachine::State::unavailable)
{
mpl::log(mpl::Level::info, name, "Ignoring mount since instance unavailable.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I think we need a better way to represent "unavailability". My main concern is that this approach is error-prone and repetitive. The problem is that "unavailable" instances are still a part of the operative instances, which is wrong. An unavailable instance is not operative since it can't be put into an operative state unless it's made available, so an unavailable instance shouldn't be a part of the "operative" set of VMs in the first place.

Right now, we have two sets of VMs: "operative" and "deleted". I think what we need here is a third category, "unavailable". This way, we can just update the error messages as does not exist or unavailable and call it a day, until we overhaul the VM management logic. @ricab, what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good point. I haven't thought a lot about it, but I think I agree with you @xmkg. We need to define also what happens at the intersection of deleted/unavailable i.e., deleted instances that belong to an unavailable AZ. I guess we just declare "deleted" takes precedence over "unavailable"? We need to answer a few questions:

  • does deleting and instance unsubscribe it from the AZ?
  • does enabling/disabling an AZ affect deleted instances? (I hope not)
  • what happens on recover of an instance whose AZ is unavailable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ricab, good questions. Regarding the deleted + unavailable at the same time, I think it's reasonable to make deleted take precedence over unavailable. Another avenue might be to consolidate all instances to a single container and represent the "availability" state in a bit flag. This way, an instance can be deleted & unavailable at the same time.

Consolidating all instances into a single container carries a cost: lookup and iteration speed. This can be troublesome for users with many instances. Otherwise, std::ranges filtering would work just fine. If iteration speed matters, another option might be using a multi-index container, which allows for different indices over a set of values instead of just one.

Regarding the last three questions, those should be answered in the AZ spec itself, if not already. @Sploder12 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xmkg I fully agree we need a better way to represent "unavailability", it is known that my current approach is terrible. But, my issue with using operative instances and a new unavailable table is that we'd be moving even more responsibility into the daemon. Also operative instances is the most unsafe thing in our entire code base at the moment. It guarantees race conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does deleting and instance unsubscribe it from the AZ?

Based on spec and implementation, no. Although it'd be nice if this were possible.

does enabling/disabling an AZ affect deleted instances? (I hope not)

Yes, it is necessary for recover in the current implementation :)

what happens on recover of an instance whose AZ is unavailable?

This is unaccounted for by the spec. Implementation recovers the instance in an unavailable state.

continue;

Check warning on line 2036 in src/daemon/daemon.cpp

View check run for this annotation

Codecov / codecov/patch

src/daemon/daemon.cpp#L2035-L2036

Added lines #L2035 - L2036 were not covered by tests
}

auto& vm_mounts = mounts[name];
if (vm_mounts.find(target_path) != vm_mounts.end())
{
Expand Down Expand Up @@ -2182,6 +2190,9 @@
case VirtualMachine::State::suspending:
fmt::format_to(std::back_inserter(start_errors), "Cannot start the instance '{}' while suspending.", name);
continue;
case VirtualMachine::State::unavailable:
fmt::format_to(std::back_inserter(start_errors), "Cannot start the instance '{}' while unavailable.", name);
continue;

Check warning on line 2195 in src/daemon/daemon.cpp

View check run for this annotation

Codecov / codecov/patch

src/daemon/daemon.cpp#L2193-L2195

Added lines #L2193 - L2195 were not covered by tests
case VirtualMachine::State::delayed_shutdown:
delayed_shutdown_instances.erase(name);
continue;
Expand Down Expand Up @@ -2244,6 +2255,16 @@
return this->shutdown_vm(vm, delay_minutes);
};

operation = [op = std::move(operation)](VirtualMachine& vm) {
if (vm.current_state() == VirtualMachine::State::unavailable)

Check warning on line 2259 in src/daemon/daemon.cpp

View check run for this annotation

Codecov / codecov/patch

src/daemon/daemon.cpp#L2258-L2259

Added lines #L2258 - L2259 were not covered by tests
{
mpl::log(mpl::Level::info, vm.vm_name, "Ignoring stop since instance is unavailable.");
return grpc::Status::OK;

Check warning on line 2262 in src/daemon/daemon.cpp

View check run for this annotation

Codecov / codecov/patch

src/daemon/daemon.cpp#L2261-L2262

Added lines #L2261 - L2262 were not covered by tests
}

return op(vm);
};

Check warning on line 2266 in src/daemon/daemon.cpp

View check run for this annotation

Codecov / codecov/patch

src/daemon/daemon.cpp#L2265-L2266

Added lines #L2265 - L2266 were not covered by tests

status = cmd_vms(instance_selection.operative_selection, operation);
}

Expand Down Expand Up @@ -2274,6 +2295,12 @@
{
config->factory->require_suspend_support();
status = cmd_vms(instance_selection.operative_selection, [this](auto& vm) {
if (vm.current_state() == VirtualMachine::State::unavailable)
{
mpl::log(mpl::Level::info, vm.vm_name, "Ignoring suspend since instance is unavailable.");
return grpc::Status::OK;

Check warning on line 2301 in src/daemon/daemon.cpp

View check run for this annotation

Codecov / codecov/patch

src/daemon/daemon.cpp#L2300-L2301

Added lines #L2300 - L2301 were not covered by tests
}

stop_mounts(vm.vm_name);

vm.suspend();
Expand Down Expand Up @@ -2309,6 +2336,12 @@

const auto& instance_targets = instance_selection.operative_selection;
status = cmd_vms(instance_targets, [this](auto& vm) {
if (vm.current_state() == VirtualMachine::State::unavailable)

Check warning on line 2339 in src/daemon/daemon.cpp

View check run for this annotation

Codecov / codecov/patch

src/daemon/daemon.cpp#L2339

Added line #L2339 was not covered by tests
{
mpl::log(mpl::Level::info, vm.vm_name, "Ignoring restart since instance is unavailable.");
return grpc::Status::OK;

Check warning on line 2342 in src/daemon/daemon.cpp

View check run for this annotation

Codecov / codecov/patch

src/daemon/daemon.cpp#L2341-L2342

Added lines #L2341 - L2342 were not covered by tests
}

stop_mounts(vm.vm_name);

return reboot_vm(vm);
Expand Down Expand Up @@ -2382,6 +2415,12 @@
{
const auto& instance_name = vm_it->first;

if (vm_it->second->current_state() == VirtualMachine::State::unavailable)
{
mpl::log(mpl::Level::info, instance_name, "Ignoring delete since instance is unavailable.");
continue;

Check warning on line 2421 in src/daemon/daemon.cpp

View check run for this annotation

Codecov / codecov/patch

src/daemon/daemon.cpp#L2420-L2421

Added lines #L2420 - L2421 were not covered by tests
}

auto snapshot_pick_it = instance_snapshots_map.find(instance_name);
const auto& [pick, all] = snapshot_pick_it == instance_snapshots_map.end() ? SnapshotPick{{}, true}
: snapshot_pick_it->second;
Expand Down Expand Up @@ -2425,12 +2464,19 @@
const auto& name = path_entry.instance_name();
const auto target_path = QDir::cleanPath(QString::fromStdString(path_entry.target_path())).toStdString();

if (operative_instances.find(name) == operative_instances.end())
auto vm = operative_instances.find(name);
if (vm == operative_instances.end())
{
add_fmt_to(errors, "instance '{}' does not exist", name);
continue;
}

if (vm->second->current_state() == VirtualMachine::State::unavailable)
{
mpl::log(mpl::Level::info, name, "Ignoring umount since instance unavailable.");
continue;

Check warning on line 2477 in src/daemon/daemon.cpp

View check run for this annotation

Codecov / codecov/patch

src/daemon/daemon.cpp#L2476-L2477

Added lines #L2476 - L2477 were not covered by tests
}

auto& vm_spec_mounts = vm_instance_specs[name].mounts;
auto& vm_mounts = mounts[name];

Expand Down Expand Up @@ -3589,6 +3635,12 @@
return fmt::to_string(errors);
}
const auto vm = it->second;

if (vm->current_state() == VirtualMachine::State::unavailable)
{
return fmt::to_string(errors);

Check warning on line 3641 in src/daemon/daemon.cpp

View check run for this annotation

Codecov / codecov/patch

src/daemon/daemon.cpp#L3641

Added line #L3641 was not covered by tests
}

vm->wait_until_ssh_up(timeout);

if (std::is_same<Reply, LaunchReply>::value)
Expand Down
6 changes: 4 additions & 2 deletions src/platform/backends/qemu/qemu_virtual_machine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -247,8 +247,10 @@ mp::QemuVirtualMachine::QemuVirtualMachine(const VirtualMachineDescription& desc
AvailabilityZone& zone,
const Path& instance_dir,
bool remove_snapshots)
: BaseVirtualMachine{mp::backend::instance_image_has_snapshot(desc.image.image_path, suspend_tag) ? State::suspended
: State::off,
: BaseVirtualMachine{!zone.is_available() ? State::unavailable
: mp::backend::instance_image_has_snapshot(desc.image.image_path, suspend_tag)
? State::suspended
: State::off,
desc.vm_name,
key_provider,
zone,
Expand Down
28 changes: 26 additions & 2 deletions src/platform/backends/shared/base_availability_zone.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,32 @@
m.available = new_available;
serialize();

for (auto& vm : m.vms)
vm.get().set_available(m.available);
try
{
for (auto& vm : m.vms)
vm.get().set_available(new_available);
}
catch (...)

Check warning on line 125 in src/platform/backends/shared/base_availability_zone.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_availability_zone.cpp#L125

Added line #L125 was not covered by tests
{
// if an error occurs fallback to available.
m.available = true;
serialize();

Check warning on line 129 in src/platform/backends/shared/base_availability_zone.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_availability_zone.cpp#L128-L129

Added lines #L128 - L129 were not covered by tests

// make sure nothing is still unavailable.
for (auto& vm : m.vms)

Check warning on line 132 in src/platform/backends/shared/base_availability_zone.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_availability_zone.cpp#L132

Added line #L132 was not covered by tests
{
// setting the state here breaks encapsulation, but it's already broken.
std::unique_lock vm_lock{vm.get().state_mutex};
if (vm.get().current_state() == VirtualMachine::State::unavailable)

Check warning on line 136 in src/platform/backends/shared/base_availability_zone.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_availability_zone.cpp#L135-L136

Added lines #L135 - L136 were not covered by tests
{
vm.get().state = VirtualMachine::State::off;
vm.get().update_state();

Check warning on line 139 in src/platform/backends/shared/base_availability_zone.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_availability_zone.cpp#L138-L139

Added lines #L138 - L139 were not covered by tests
}
}

// rethrow the error so something else can deal with it.
throw;

Check warning on line 144 in src/platform/backends/shared/base_availability_zone.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_availability_zone.cpp#L144

Added line #L144 was not covered by tests
}
}

void BaseAvailabilityZone::add_vm(VirtualMachine& vm)
Expand Down
27 changes: 25 additions & 2 deletions src/platform/backends/shared/base_virtual_machine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@

void assert_vm_stopped(St state)
{
assert(state == St::off || state == St::stopped);
assert(state == St::off || state == St::stopped || state == St::unavailable);
}

mp::Path derive_head_path(const QDir& snapshot_dir)
Expand Down Expand Up @@ -199,7 +199,7 @@
void mp::BaseVirtualMachine::check_state_for_shutdown(ShutdownPolicy shutdown_policy)
{
// A mutex should already be locked by the caller here
if (state == State::off || state == State::stopped)
if (state == State::off || state == State::stopped || state == State::unavailable)
{
throw VMStateIdempotentException{"Ignoring shutdown since instance is already stopped."};
}
Expand Down Expand Up @@ -232,6 +232,29 @@
}
}

void mp::BaseVirtualMachine::set_available(bool available)

Check warning on line 235 in src/platform/backends/shared/base_virtual_machine.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_virtual_machine.cpp#L235

Added line #L235 was not covered by tests
{
if (available)

Check warning on line 237 in src/platform/backends/shared/base_virtual_machine.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_virtual_machine.cpp#L237

Added line #L237 was not covered by tests
{
state = State::off;
update_state();
if (was_running)

Check warning on line 241 in src/platform/backends/shared/base_virtual_machine.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_virtual_machine.cpp#L239-L241

Added lines #L239 - L241 were not covered by tests
{
start();

Check warning on line 243 in src/platform/backends/shared/base_virtual_machine.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_virtual_machine.cpp#L243

Added line #L243 was not covered by tests

// normally the daemon sets the state to running...
state = State::running;
update_state();

Check warning on line 247 in src/platform/backends/shared/base_virtual_machine.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_virtual_machine.cpp#L246-L247

Added lines #L246 - L247 were not covered by tests
}
return;

Check warning on line 249 in src/platform/backends/shared/base_virtual_machine.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_virtual_machine.cpp#L249

Added line #L249 was not covered by tests
}

was_running = state == State::running || state == State::starting;
shutdown(ShutdownPolicy::Poweroff);
state = State::unavailable;
update_state();

Check warning on line 255 in src/platform/backends/shared/base_virtual_machine.cpp

View check run for this annotation

Codecov / codecov/patch

src/platform/backends/shared/base_virtual_machine.cpp#L252-L255

Added lines #L252 - L255 were not covered by tests
}

std::string mp::BaseVirtualMachine::ssh_exec(const std::string& cmd, bool whisper)
{
std::unique_lock lock{state_mutex};
Expand Down
6 changes: 1 addition & 5 deletions src/platform/backends/shared/base_virtual_machine.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,7 @@ class BaseVirtualMachine : public VirtualMachine

virtual std::string ssh_exec(const std::string& cmd, bool whisper = false) override;

void set_available(bool available) override
{
// TODO make vm unavailable by force stopping if running or available by starting again if it was running
throw NotImplementedOnThisBackendException("unavailability");
}
void set_available(bool available) override;

void wait_until_ssh_up(std::chrono::milliseconds timeout) override;
void wait_for_cloud_init(std::chrono::milliseconds timeout) override;
Expand Down
2 changes: 1 addition & 1 deletion src/platform/platform_linux.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -424,7 +424,7 @@ mp::VirtualMachineFactory::UPtr mp::platform::vm_backend(const mp::Path& data_di

#if VIRTUALBOX_ENABLED
if (driver == QStringLiteral("virtualbox"))
return std::make_unique<VirtualBoxVirtualMachineFactory>(data_dir);
return std::make_unique<VirtualBoxVirtualMachineFactory>(data_dir, az_manager);
#endif

throw std::runtime_error(fmt::format("Unsupported virtualization driver: {}", driver));
Expand Down
1 change: 1 addition & 0 deletions src/rpc/multipass.proto
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,7 @@ message InstanceStatus {
DELAYED_SHUTDOWN = 6;
SUSPENDING = 7;
SUSPENDED = 8;
UNAVAILABLE = 9;
}
Status status = 1;
}
Expand Down
8 changes: 4 additions & 4 deletions tests/test_daemon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,7 @@ TEST_F(Daemon, ensure_that_on_restart_future_completes)

// This VM was running before, but not now.
auto mock_vm = std::make_unique<NiceMock<mpt::MockVirtualMachine>>("yakety-yak");
EXPECT_CALL(*mock_vm, current_state).WillOnce(Return(mp::VirtualMachine::State::stopped));
EXPECT_CALL(*mock_vm, current_state).Times(2).WillRepeatedly(Return(mp::VirtualMachine::State::stopped));
EXPECT_CALL(*mock_vm, start).Times(1);

mp::Signal sig;
Expand Down Expand Up @@ -413,7 +413,7 @@ TEST_F(Daemon, starts_previously_running_vms_back)

// This VM was running before, but not now.
auto mock_vm = std::make_unique<NiceMock<mpt::MockVirtualMachine>>(vm_props.name);
EXPECT_CALL(*mock_vm, current_state).WillOnce(Return(mp::VirtualMachine::State::stopped));
EXPECT_CALL(*mock_vm, current_state).Times(2).WillRepeatedly(Return(mp::VirtualMachine::State::stopped));
EXPECT_CALL(*mock_vm, start).Times(1);
EXPECT_CALL(*mock_vm, update_state).Times(1);
EXPECT_CALL(*mock_vm, wait_until_ssh_up).Times(1);
Expand All @@ -434,7 +434,7 @@ TEST_F(Daemon, calls_on_restart_for_already_running_vms_on_construction)

// This VM was running before, but not now.
auto mock_vm = std::make_unique<NiceMock<mpt::MockVirtualMachine>>(vm_props.name);
EXPECT_CALL(*mock_vm, current_state).WillOnce(Return(mp::VirtualMachine::State::running));
EXPECT_CALL(*mock_vm, current_state).Times(2).WillRepeatedly(Return(mp::VirtualMachine::State::running));
EXPECT_CALL(*mock_vm, start).Times(0);
EXPECT_CALL(*mock_vm, update_state).Times(1);
EXPECT_CALL(*mock_vm, wait_until_ssh_up).Times(1);
Expand All @@ -455,7 +455,7 @@ TEST_F(Daemon, calls_on_restart_for_already_starting_vms_on_construction)

// This VM was running before, but not now.
auto mock_vm = std::make_unique<NiceMock<mpt::MockVirtualMachine>>(vm_props.name);
EXPECT_CALL(*mock_vm, current_state).WillOnce(Return(mp::VirtualMachine::State::starting));
EXPECT_CALL(*mock_vm, current_state).Times(2).WillRepeatedly(Return(mp::VirtualMachine::State::starting));
EXPECT_CALL(*mock_vm, start).Times(0);
EXPECT_CALL(*mock_vm, update_state).Times(1);
EXPECT_CALL(*mock_vm, wait_until_ssh_up).Times(1);
Expand Down
Loading