-
Notifications
You must be signed in to change notification settings - Fork 3.9k
roachtest: move task termination #147206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
roachtest: move task termination #147206
Conversation
Expose cancel on the task manager to allow the test framework to cancel tasks without having to terminate the manager. Epic: None Release note: None
If it's a timeout, the return escapes; i.e., when |
Argh, you're right, good catch. Tempted to move it to the defer that closes the channel. Will have a look again tomorrow to make sure this executes after test code. |
14fbb06
to
86f7837
Compare
@@ -2296,6 +2296,15 @@ func monitorTasks(ctx context.Context, taskManager task.Manager, t test.Test, l | |||
} | |||
} | |||
}() | |||
|
|||
return func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved task termination to be returned as a function that is deferred to invoke after the test returns.
86f7837
to
e540a26
Compare
pkg/cmd/roachtest/test_runner.go
Outdated
@@ -1402,6 +1404,10 @@ func (r *testRunner) runTest( | |||
// We suppress other failures from being surfaced to the top as the timeout is always going | |||
// to be the main error and subsequent errors (i.e. context cancelled) add noise. | |||
t.suppressFailures() | |||
|
|||
// Cancel tasks to ensure that any stray tasks are cleaned up. | |||
t.taskManager.Cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above we add the timeout failure intentionally without cancelling the context
so why don't we do something similar here? i.e. why not call t.taskManager.Cancel()
in teardownTest
in the timeout case:
cockroach/pkg/cmd/roachtest/test_runner.go
Lines 1631 to 1634 in e540a26
// We previously added a timeout failure without cancellation, so we cancel here. | |
if t.mu.cancel != nil { | |
t.mu.cancel() | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll move it.
Previously, the task manager was terminated during test teardown. The teardown would happen directly after a test timeout as well. At this point the test code could still be running, and new tasks could be initiated. This could result in undefined behavior. This change moves the task termination to after the test has returned. Even though it's still possible to start new tasks after a test has timed out, these tasks should be short-lived and should not cause any issues. When the test code returns, the task manager is terminated and any stray tasks are cleaned up. Fixes: cockroachdb#143973 Epic: None Release note: None
e540a26
to
972732a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Previously, the task manager was terminated during test teardown. The teardown
would happen directly after a test timeout as well. At this point the test code
could still be running, and new tasks could be initiated. This could result in
undefined behavior. This change moves the task termination to after the test has
returned.
Even though it's still possible to start new tasks after a test has timed out,
these tasks should be short-lived and should not cause any issues. When the test
code returns, the task manager is terminated and any stray tasks are cleaned up.
Fixes: #143973
Epic: None
Release note: None