Skip to content

ThreadsafeFunction in worker_threads cause segfault randomly #58484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Brooooooklyn opened this issue May 27, 2025 · 0 comments
Open

ThreadsafeFunction in worker_threads cause segfault randomly #58484

Brooooooklyn opened this issue May 27, 2025 · 0 comments

Comments

@Brooooooklyn
Copy link

Brooooooklyn commented May 27, 2025

Version

v22.16.0

Platform

Linux ubuntu-22.04 6.13.7-orbstack-00283-g9d1400e7e9c6 #104 SMP Mon Mar 17 06:15:48 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

Steps

  • clone https://github.com/napi-rs/napi-rs
  • checkout 05-25-test_stress_test_on_aarch64_linux_gnu_platform branch
  • Install latest Node.js and Rust
  • yarn install
  • yarn build:test
  • yarn workspace @examples/napi test tests/worker-thread.spec.ts --match '*worker_threads'

Summary

Because NAPI-RS encapsulates too many things, I'll describe as briefly as possible the scenario where I encountered a segfault.

Here is a simple async function in Rust:

#[napi]
pub fn buffer_pass_through(buffer: Buffer) -> Buffer {
  buffer
}

The NAPI-RS would do these under the hood:

  • call napi_create_promise and get defer and promise, promise would return directly
  • call napi_create_threadsafe_function and pass the defer as the ThreadsafeFunction context
  • use tokio::spawn to run the async function, call the ThreadsafeFunction when async function has a value
  • call napi_release_threadsafe_function after the ThreadsafeFunction is called

If this function is not called in worker_threads, there's no problem. I tried writing a loop that calls it hundreds of thousands of times, and didn't find any issues.

However, when this function is called in worker_threads, segfaults occasionally occur, which has been observed both in CI and in feedback from my users.

Backtrace from lldb:

* thread #26, name = 'tokio-runtime-w', stop reason = signal SIGABRT
  * frame #0: 0x0000fffff7b07608 libc.so.6`__pthread_kill_implementation(threadid=281472292351904, signo=6, no_tid=<unavailable>) at pthread_kill.c:44:76
    frame #1: 0x0000fffff7abcb3c libc.so.6`__GI_raise(sig=6) at raise.c:26:13
    frame #2: 0x0000fffff7aa7e00 libc.so.6`__GI_abort at abort.c:79:7
    frame #3: 0x0000aaaaad7dfb38 node`uv_mutex_lock(mutex=0x0000fffdf816b038) at thread.c:345:5
    frame #4: 0x0000aaaaabeab7c4 node`node::LibuvMutexTraits::mutex_lock(mutex=0x0000fffdf816b038) at node_mutex.h:183:18
    frame #5: 0x0000aaaaabead48c node`node::MutexBase<node::LibuvMutexTraits>::ScopedLock::ScopedLock(this=0x0000ffff5fffc370, mutex=0x0000fffdf816b038) at node_mutex.h:285:21
    frame #6: 0x0000aaaaac03f4d4 node`v8impl::(anonymous namespace)::ThreadSafeFunction::Release(this=0x0000fffdf816b010, mode=napi_tsfn_release) const at node_api.cc:276:45
    frame #7: 0x0000aaaaac043160 node`napi_release_threadsafe_function(func=0x0000fffdf816b010, mode=napi_tsfn_release) at node_api.cc:1411:70
    frame #8: 0x0000ffffddc45c78 example.linux-arm64-gnu.node`napi::js_values::deferred::JsDeferred$LT$Data$C$Resolver$GT$::call_tsfn::h488d0dbaa4a26cb5(self=JsDeferred<napi::js_values::unknown::Unknown, napi::tokio_runtime::execute_tokio_future::{async_block#0}::{closure_env#0}<napi::bindgen_runtime::js_values::arraybuffer::Uint8Array, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{async_block_env#1}, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{closure_env#2}, napi::error::Error<napi::status::Status>>> @ 0x0000ffff5fffc490, result=<unavailable>) at deferred.rs:183:7
    frame #9: 0x0000ffffddc44d40 example.linux-arm64-gnu.node`napi::js_values::deferred::JsDeferred$LT$Data$C$Resolver$GT$::resolve::h3d2884560cd31212(self=JsDeferred<napi::js_values::unknown::Unknown, napi::tokio_runtime::execute_tokio_future::{async_block#0}::{closure_env#0}<napi::bindgen_runtime::js_values::arraybuffer::Uint8Array, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{async_block_env#1}, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{closure_env#2}, napi::error::Error<napi::status::Status>>> @ 0x0000ffff5fffc520, resolver=<unavailable>) at deferred.rs:154:5
    frame #10: 0x0000ffffddb57d00 example.linux-arm64-gnu.node`napi::tokio_runtime::execute_tokio_future::_$u7b$$u7b$closure$u7d$$u7d$::ha7f1ee1bf2582723((null)=0x0000ffff5fffc9b0) at tokio_runtime.rs:233:16
    frame #11: 0x0000ffffdd9158fc example.linux-arm64-gnu.node`tokio::runtime::task::core::Core$LT$T$C$S$GT$::poll::_$u7b$$u7b$closure$u7d$$u7d$::hdd5c00f1ec17be65(ptr=0x0000fffdf815abb0) at core.rs:331:17
    frame #12: 0x0000ffffdd8fed1c example.linux-arm64-gnu.node`tokio::runtime::task::core::Core$LT$T$C$S$GT$::poll::h2a9ed49810ab1294 [inlined] tokio::loom::std::unsafe_cell::UnsafeCell$LT$T$GT$::with_mut::h21552376c10d6f31(self=0x0000fffdf815abb0, f={closure_env#0}<napi::tokio_runtime::execute_tokio_future::{async_block_env#0}<napi::bindgen_runtime::js_values::arraybuffer::Uint8Array, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{async_block_env#1}, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{closure_env#2}, napi::error::Error<napi::status::Status>>, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> @ 0x0000ffff5fffc968) at unsafe_cell.rs:16:9
    frame #13: 0x0000ffffdd8fed00 example.linux-arm64-gnu.node`tokio::runtime::task::core::Core$LT$T$C$S$GT$::poll::h2a9ed49810ab1294(self=0x0000fffdf815aba0, cx=<unavailable>) at core.rs:320:13

How often does it reproduce? Is there a required condition?

Repeat 3-5 times and it will appear randomly.

What is the expected behavior? Why is that the expected behavior?

No segfault

What do you see instead?

Segfault

Additional information

Maybe related: #55706

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant