Skip to content

[Bug]: rspack build gets stuck at ci #9665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
uladzimirdev opened this issue Mar 13, 2025 · 35 comments
Open

[Bug]: rspack build gets stuck at ci #9665

uladzimirdev opened this issue Mar 13, 2025 · 35 comments
Assignees

Comments

@uladzimirdev
Copy link

uladzimirdev commented Mar 13, 2025

System Info

 Binaries:
    Node: 22.13.1 - ~/.volta/tools/image/node/22.13.1/bin/node
    Yarn: 1.22.19 - ~/.volta/tools/image/yarn/1.22.19/bin/yarn
    npm: 10.8.2 - ~/.volta/tools/image/npm/10.8.2/bin/npm
    Watchman: 2024.08.12.00 - /opt/homebrew/bin/watchman
  npmPackages:
    @rspack/cli: ^1.2.8 => 1.2.8
    @rspack/core: ^1.2.8 => 1.2.8
    @rspack/plugin-react-refresh: ^1.0.1 => 1.0.1

Details

to create a prod build I use a command WEBPACK_BUNDLE=production NODE_OPTIONS=--max-old-space-size=8196 rspack build, which basically runs rspack build. After upgrade from v1.1.6 to 1.2.8 (and it actually happened with 1.2.5) this command gets stuck and does not provide any additional info in 20min, then CI fails by timeout. Usual time to build static resources at CI is 100s.

I don't have a link with reproduction, as it happens from time to time. But here is a link to the GHA job

rspack config

Image

Reproduce link

No response

Reproduce Steps

WEBPACK_BUNDLE=production NODE_OPTIONS=--max-old-space-size=8196 rspack build

@uladzimirdev uladzimirdev added the pending triage The issue/PR is currently untouched. label Mar 13, 2025
@JSerFeng
Copy link
Contributor

The reasons for stuck are so many, if you can provide a repro I believe we can help solving that in a few days

Copy link
Contributor

Hello @uladzimirdev, sorry we can't investigate the problem further without reproduction demo, please provide a repro demo by forking rspack-repro, or provide a minimal GitHub repository by yourself. Issues labeled by need reproduction will be closed if no activities in 14 days.

@uladzimirdev
Copy link
Author

@JSerFeng clear, I'll try to do it, but so far no luck. Any tips on how to collect debug info?

@hardfist
Copy link
Contributor

hardfist commented Mar 14, 2025

@uladzimirdev there're some known deadlock potential bugs we're fixing, we'll release a canary version so you can try whether it fixes your problem

@uladzimirdev
Copy link
Author

uladzimirdev commented Mar 14, 2025

thanks @hardfist. I'm trying to find the prerequisites to repro this possible deadlock. You know, we run 2 jobs at CI in parallel (only env variables are different) and second job gets stuck maybe 3/100 times, so it's really hard to verify atm.

it's not reproducible locally, no matter how much resources do I provide and how many other tasks I run to make CPU busy.
I've created CPU profiles, but nothing caught my eye. if it was a circular dep problem, I'd be able to reproduce it locally.

I had to upgrade swc packages together with rspack, maybe there can be some problem.

Rsdoctor didn't show any specific problem

CPU Profile

Image

@hardfist
Copy link
Contributor

hardfist commented Mar 14, 2025

normally deadlock is caused by rust side so you need rust side profile to debug, you can generate cpu profile by following this guide https://rspack.dev/contribute/development/profiling#samply

you can try @rspack-canary/[email protected] by following this guide to see whether it solves your deadlock problem

@uladzimirdev
Copy link
Author

I've been able to gather some logs from github actions, using TRACE level. The file is huge, I uploaded it to google drive.

Please let me know if I need to upload it somewhere else so you have access to it

@hardfist
Copy link
Contributor

I've been able to gather some logs from github actions, using TRACE level. The file is huge, I uploaded it to google drive.

Please let me know if I need to upload it somewhere else so you have access to it

en, seem stuck in emit_assets phase(which is the last phase of rspack)

@hardfist
Copy link
Contributor

hardfist commented Mar 28, 2025

@uladzimirdev this may caused by https://github.com/web-infra-dev/rspack/pull/9587/files#diff-3fbb9f7dfbadceab7b0c89038c54de9851f25ae957a1f91aea05b0e46da2b209L367 which cause block_on on js function call and should be fixed in 1.3.0, can you help test whether it's still stuck with 1.3.0

@slorber
Copy link

slorber commented Mar 28, 2025

On Docusaurus repo (currently Rspack 1.2.5) we also encounter this.

Example CI job timeout: https://github.com/facebook/docusaurus/actions/runs/13945874964/job/39032844416

I'm not sure when it started to happen, but probably after 1.2.x, when we also turned on persistent cache.


I've also encountered it locally. I'm not 100% sure but I think it also happened with the dev server.

From what I remember, restarting the dev server or the prod build would consistently lead to the bundling process being stuck again at the end (100% progress bar, I think the status was "sealing" or "emiting" or something)

I also think that cleaning node_module/.cache folder permitted us to restart the dev server / prod build and make it complete successfully. So I assume it might be related to some kind of corrupted persistent cache?

I'm not 100% sure of all this. Will try to investigate more the next time I see this issue locally, but I don't know exactly how to trigger it, unfortunately.

@uladzimirdev
Copy link
Author

@hardfist @h-a-n-a FYI I didn't have any deadlock after upgrade to 1.3.0, maybe it's too early, but my issue seems to be resolved

@benjdlambert
Copy link

@slorber I think that we're also running into this when testing this out with our Backstage microsite and docusaurus: backstage/backstage#29413

Can't seem to reproduce this locally, but fails consistently in CI.

Tried a lot of things to see get something working, including trying 1.30.

Happy to get some debug logs to help with working out what's wrong here 🙏

@hardfist
Copy link
Contributor

hardfist commented Apr 1, 2025

since backstage is OSS, @benjdlambert if you met deadlock issue in your ci, please ping me so I can investigate it in your repo

@benjdlambert
Copy link

@hardfist the PR above is using rspack with docusaurus. Feel free to dig around on that branch and fork to run some tests. The logs are also there too with the failures.

backstage/backstage#29413

@slorber
Copy link

slorber commented Apr 1, 2025

@benjdlambert in my case it doesn't reproduce consistently, so it might be something else.

Image

Maybe try to use Rspack 1.1 and see if it improves, I don't remember having issues with that version.

@benjdlambert
Copy link

@slorber i actually tried that in a previous commit and got the same issue. So to be honest I’m not sure at this point what’s causing it, if it actually is a deadlock or no. The symptoms are the same as this issue though.

@rrussell0
Copy link

I'm having similar problems, but just while using rsbuild build. Our pipelines started timing out about 3 weeks ago randomly when building multiple projects. It's like rsbuild isn't properly exiting when it completes a build. I can rerun the exact same code on the same agent and it'll succeed 95% of the time. It's not project-specific (I'm running nx run-many to build projects in a mono-repo, and I've seen our task time out in between different projects). It's like it just hangs and doesn't move to the next build, eventually timing out. I've turned on all the verbose logs that I can and I can't seem to pinpoint an exact cause.

@hardfist
Copy link
Contributor

hardfist commented Apr 4, 2025

@rrussell0 most of the deadlock issue is solved in 1.3.0 can you try to upgrade to see whether it is solved

@jtsorlinis
Copy link

jtsorlinis commented Apr 16, 2025

We have been having a similar issue, randomly hangs in CI (github actions) without any errors or anything.

I tried adding a progressHandler to get some logs with:

const handler = (percentage: any, message: any, ...args: any) => {
  console.info(percentage);
  console.log(message);
  console.log(args);
  console.log('----------------------------------');
};
new rspack.ProgressPlugin(handler);

And all we get is the following, seems to be a different random file each time.

Image

Is there a way to enable rust logging in CI?

@hardfist
Copy link
Contributor

@jtsorlinis try RSPACK_PROFILE=TRACE=layer=logger rspack build

@benjdlambert
Copy link

benjdlambert commented Apr 16, 2025

@hardfist I tried to do this on the backstage repo for the docusaurus build but didn't seem to get any additional logs. I guess it's the rspack build command that picks up on these env vars.

Is there anything I can add for those builds to get more logs?

https://github.com/backstage/backstage/blob/28228d3623f5f05a1fa49e977476ce0df8792a21/.github/workflows/verify_microsite.yml#L275-L279

@rrussell0
Copy link

@rrussell0 most of the deadlock issue is solved in 1.3.0 can you try to upgrade to see whether it is solved

After upgrading RSbuild in our project, the deadlocks seem to have stopped - at least none in the last week. Thanks!

@jtsorlinis
Copy link

jtsorlinis commented Apr 16, 2025

Can't see any errors even with trace enabled, it seems to just stop and then timeout. Not sure if the logs will help.

I've attached the last 750 lines or so because the whole log was ~2GB

truncated-logs.txt

Just to confirm this is happening on both 1.2.8 and 1.3.5, and seems to happen at random (80% of builds succeed)

@hardfist
Copy link
Contributor

hardfist commented Apr 24, 2025

Can't see any errors even with trace enabled, it seems to just stop and then timeout. Not sure if the logs will help.

I've attached the last 750 lines or so because the whole log was ~2GB

truncated-logs.txt

Just to confirm this is happening on both 1.2.8 and 1.3.5, and seems to happen at random (80% of builds succeed)

@benjdlambert @jtsorlinis RSPACK_TRACE_LAYER=logger RSPACK_PROFILE=OVERVIEW rspack build can generate much smaller log in 1.3.6 version

@jtsorlinis
Copy link

Hi @hardfist,

Trying with RSPACK_PROFILE=OVERVIEW seems to produce very similar levels of logging as RSPACK_PROFILE=TRACE

In other news, the build hangs and times out consistently now in gh actions but works fine locally

@hardfist
Copy link
Contributor

hardfist commented Apr 28, 2025

Hi @hardfist,

Trying with RSPACK_PROFILE=OVERVIEW seems to produce very similar levels of logging as RSPACK_PROFILE=TRACE

In other news, the build hangs and times out consistently now in gh actions but works fine locally

can you help upload trace.json to github artifacts and share it with me?

@jtsorlinis
Copy link

@hardfist sorry for not getting back to you, as this was holding up our development I ended up just moving us back to webpack+swc-loader for the time being.

I will try provide you with a trace when I get some spare time

@benjdlambert
Copy link

@hardfist I would try and help here with a trace.json as our builds fails all of the time, and feels pretty good test environment. However, from what I can tell at the moment, that the runner just totally crashes when running, so I'm not 100% sure how we're gonna be able to get a dump from a host that's unresponsive.

I'm using rspack through the docusaurus build, so is there a way that I can get a trace.json to disk easily? @slorber also maybe you might know of any props that get passed through to rspack to be able to debug this?

@hardfist
Copy link
Contributor

hardfist commented May 12, 2025

@benjdlambert does deadlock happens in backstage? if it's open source I can debug in your repo, honestly it's not easy to debug deadlock problems right now, we're still investigating better solutions for deadlock detection #10327

@benjdlambert
Copy link

does deadlock happens in backstage? if it's open source I can debug in your repo

I'm not sure it's a deadlock, but it consistently fails every build and seems like the agent dies. backstage/backstage#29413 is the PR and branch. If I can help, please let me know 🙏

@slorber
Copy link

slorber commented May 12, 2025

@benjdlambert I'd suggest resolving to the very latest version of Rspack instead of ^1.3.0 (your site is resolving to 1.3.4). I don't have any issues anymore. But maybe that's a different problem.

Your logs do not clearly show that it's a problem related to Rspack, so I think it's too early to involve the Rspack team. Let's discuss that on your own repo instead. Here's a PR where I can help you figure it out: backstage/backstage#29905

@hardfist
Copy link
Contributor

if anyone have reproducible deadlock repo, please let me know, it would be very helpful for me to debug the deadlock cause

@hardfist
Copy link
Contributor

hardfist commented May 13, 2025

@benjdlambert @jtsorlinis @uladzimirdev can you share your rspack config file with me, I notice there maybe some deadlock risk if you use function in your rspack config( related to #10341)

@uladzimirdev
Copy link
Author

Copy link
Contributor

Since the issue was labeled with need reproduction, but no response in 14 days. This issue will be closed. Feel free to comment and reopen it if you have any further questions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 28, 2025
@hardfist hardfist reopened this May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants