Skip to content

jdk_lang_j9_1 java/lang/Thread/virtual/Starvation.java Unexpected exit from test [exit code: 137] #21957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JasonFengJ9 opened this issue May 27, 2025 · 6 comments

Comments

@JasonFengJ9
Copy link
Member

JasonFengJ9 commented May 27, 2025

Failure link

From internal Test_openjdk24_j9_sanity.openjdk_ppc64le_linux_testList_0 (rtj-ubu24le-rtp-test-932vb-1)

openjdk version "24.0.1-beta" 2025-04-15
IBM Semeru Runtime Open Edition 24.0.1+9-202505241603 (build 24.0.1-beta+9-202505241603)
Eclipse OpenJ9 VM 24.0.1+9-202505241603 (build master-7bed029788, JRE 24 Linux ppc64le-64-Bit Compressed References 20250524_67 (JIT enabled, AOT enabled)
OpenJ9   - 7bed029788
OMR      - 988ae21a0
JCL      - 66c0fb7ed based on jdk-24.0.1+9)

Rerun in Grinder - Change TARGET to run only the failed test targets

Optional info

Failure output (captured from console output)

[2025-05-24T18:25:33.319Z] variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode501 -XXgc:fvtest_forceCopyForwardHybridMarkCompactRatio=10
[2025-05-24T18:25:33.319Z] JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -Xjit -Xgcpolicy:balanced -Xnocompressedrefs -XXgc:fvtest_forceCopyForwardHybridMarkCompactRatio=10 -Xverbosegclog 

[2025-05-24T18:40:01.076Z] TEST: java/lang/Thread/virtual/Starvation.java

[2025-05-24T18:40:01.078Z] TEST RESULT: Failed. Unexpected exit from test [exit code: 137]
[2025-05-24T18:40:01.078Z] --------------------------------------------------
[2025-05-24T18:41:40.455Z] Test results: passed: 919; failed: 1

[2025-05-24T18:41:47.153Z] jdk_lang_j9_1_FAILED

50x internal Grinder - failed at rtj-ubu22le-rtp-test-lac2i and rhel10le-rtbeta-2, passed at rhel8le-rtp-rt5-1 and rhel8le-svl-rt1-1.

@tajila
Copy link
Contributor

tajila commented May 27, 2025

@JasonFengJ9 Can you see if this also fails with -XX:-YieldPinnedVirtualThreads ? We saw failures like this in the past with stress tests due to running out of sub4g memory.

@JasonFengJ9
Copy link
Member Author

JasonFengJ9 commented May 27, 2025

50x ppc64le_linux jdk_lang_j9_1 -XX:-YieldPinnedVirtualThreads grinder - passed at ubu22lert-3, prhel247, rtj-rhel8le-rtp-test-g01cq-1, rtj-sles15le-svl-test-tar80-1 and sles15le-svl-rt6-1.

The initial 50x grinder with default option is still in progress failed at rtj-ubu22le-rtp-test-lac2i and rhel10le-rtbeta-2, passed at rhel8le-rtp-rt5-1 and rhel8le-svl-rt1-1.

3x grinder with -XX:-YieldPinnedVirtualThreads at rtj-ubu22le-rtp-test-lac2i - passed
3x grinder with -XX:-YieldPinnedVirtualThreads at rhel10le-rtbeta-2 - passed
Note: these two machines failed 10/10 with default option in rtj-ubu22le-rtp-test-lac2i and rhel10le-rtbeta-2

@babsingh
Copy link
Contributor

exit code: 137 indicates that the kernel terminated the process to free up memory. Similar issues have been observed in the past with Skynet, which is a known stress test.

Starvation.java launches 100,000 virtual threads and exhibits characteristics of a stress test.

On machines with limited resources, the kernel is likely to terminate such memory-intensive processes with exit code 137 to reclaim memory.

This is likely not a functional issue, but rather a potential mismatch between the test and machine; specifically, the test may be running on a machine with insufficient resources. @JasonFengJ9 based on your grinder runs, does this hypothesis seem to hold?

@JasonFengJ9
Copy link
Member Author

JasonFengJ9 commented May 28, 2025

This is likely not a functional issue, but rather a potential mismatch between the test and machine; specifically, the test may be running on a machine with insufficient resources.

As per grinder result #21957 (comment), the test failed with exit code: 137 10/10 with the default option at rtj-ubu22le-rtp-test-lac2i and rhel10le-rtbeta-2, running with -XX:-YieldPinnedVirtualThreads passed 3/3 in these two machines.

Does the default (-XX:+YieldPinnedVirtualThreads) require significant more memory than the option -XX:-YieldPinnedVirtualThreads? If that's expected, I think we can move this out of the milestone.

@babsingh
Copy link
Contributor

babsingh commented May 28, 2025

Does the default (-XX:+YieldPinnedVirtualThreads) require significant more memory than the option -XX:-YieldPinnedVirtualThreads?

Yes, the justification is below.

The test runs 100,000 iterations. In each iteration, a virtual is created that launches nproc - 1 virtual threads. The nproc - 1 virtual threads attempt to acquire an owned monitor. With nproc carrier threads, only nproc virtual threads can run at a time.

With -XX:-YieldPinnedVirtualThreads, these virtual threads remain pinned while blocked on the monitor, preventing progress until they complete. This limits concurrency and keeps memory usage low; a few MBs since only nproc virtual threads are alive at a time.

With -XX:+YieldPinnedVirtualThreads (JEP 491), blocked virtual threads yield and unmount immediately, freeing carriers for other threads. This allows rapid scaling but significantly increases memory usage (several GBs) as thousands of virtual threads may be alive concurrently. Worst case memory (for nproc=4), 4 x 100,000 virtual threads; if each virtual thread needs 256 KB, this becomes 400,000 x 256 KB ~ 97 GB.

@JasonFengJ9
Copy link
Member Author

Removing this from the milestone plan as per #21957 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants