Skip to content

fix(cli): increase size of blocking task threadpool on windows #26465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 22, 2024

Conversation

nathanwhit
Copy link
Member

@nathanwhit nathanwhit commented Oct 22, 2024

Fixes #26179.

The original error reported in that issue is fixed on canary, but in local testing on my windows machine, next build would just hang forever.

After some digging, what happens is that at some point in next build, readFile promises (from fs/promises ) just never resolve, and so next hangs.

It turns out the issue is saturating tokio's blocking task thread pool. We previously limited the number of blocking threads to 32, and at some point those threads are all in use and there's no thread available for the file reads.

What's taking up all of those threads? The answer turns out to be tokio::process. On windows, child process stdio uses the blocking threadpool: tokio-rs/tokio#4824. When you poll the child's stdio on windows, it spawns a blocking task per poll, and calls std::io::Read::read in the blocking context. That call can block until data is available.
Putting it all together, what happens is that Next.js spawns 2 * the number of CPU cores deno child subprocesses to do work. We implement child_process with tokio::process. When the child processes' stdio get polled, blocking tasks get spawned, and those blocking tasks might block until data is available. So if you have 16 cores (as I do), there are going to be potentially >32 blocking task threadpool threads taken just by the child processes. That leaves no room for other tasks to make progress


To fix this, for now, increase the size of the blocking threadpool on windows. 4 * the number of CPU cores should be enough to leave room for other tasks to make progress.

Longer term, this can be fixed more properly when we handroll our own subprocess code (needed for detached processes and additional pipes on windows).

@nathanwhit nathanwhit merged commit 8282c38 into denoland:main Oct 22, 2024
17 checks passed
@nathanwhit nathanwhit deleted the windows-blocking-threadpool branch October 22, 2024 19:52
bartlomieju pushed a commit that referenced this pull request Oct 25, 2024
Fixes #26179.

The original error reported in that issue is fixed on canary, but in
local testing on my windows machine, `next build` would just hang
forever.

After some digging, what happens is that at some point in next build,
readFile promises (from `fs/promises` ) just never resolve, and so next
hangs.

It turns out the issue is saturating tokio's blocking task thread pool.
We previously limited the number of blocking threads to 32, and at some
point those threads are all in use and there's no thread available for
the file reads.

What's taking up all of those threads? The answer turns out to be
`tokio::process`. On windows, child process stdio uses the blocking
threadpool: tokio-rs/tokio#4824. When you poll
the child's stdio on windows, it spawns a blocking task per poll, and
calls `std::io::Read::read` in the blocking context. That call can block
until data is available.
Putting it all together, what happens is that Next.js spawns `2 * the
number of CPU cores` deno child subprocesses to do work. We implement
`child_process` with `tokio::process`. When the child processes' stdio
get polled, blocking tasks get spawned, and those blocking tasks might
block until data is available. So if you have 16 cores (as I do), there
are going to be potentially >32 blocking task threadpool threads taken
just by the child processes. That leaves no room for other tasks to make
progress

---

To fix this, for now, increase the size of the blocking threadpool on
windows. 4 * the number of CPU cores should be enough to leave room for
other tasks to make progress.

Longer term, this can be fixed more properly when we handroll our own
subprocess code (needed for detached processes and additional pipes on
windows).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Building a NextJS Project in windows fails
2 participants