net: UDPConn.WriteToUDPAddrPort sometimes blocks forever on darwin/arm64 #73919
Labels
BugReport
Issues describing a possible bug in the Go implementation.
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
Go version
go version go1.24.1 darwin/arm64
Output of
go env
in your module/workspace:What did you do?
We've had a few recent reports of our UDP-heavy application (github.com/Nebula) locking up on Darwin hosts. When users hit the issue, they can reproduce it regularly. We've also had users who had the issue lose the ability to reproduce the issue. I've been unable to reproduce the issue directly myself.
What did you see happen?
When the process locks up, we see this goroutine in a stack trace:
Other goroutines attempting to write to the same UDP socket block forever on lock acquisition (as the lock is held by goroutine 22 above):
Some earlier issues that are possibly related are #61555 and comment #45211 (comment)
The comment linked above indicated that the socket write resulted in an EAGAIN return value. The goroutine is parked, and for some reason, the WRITE event from the kevent() call never fired.
I was suspicious that either the expected kevent wasn't firing, or there was a subtle race in the lock-free kevent handler.
To suss this out, I instrumented golang to include some extra debug information around the sendto() call and kevent() loop:
go version go1.25-devel_7f806c1052 Wed May 21 00:07:41 2025 -0700 darwin/arm64
I asked a user who could reproduce the issue to run with our program built with this amended golang, and got this output (file descriptor 8 is the UDP socket):
I've snipped out some IP addresses from the user's trace, but I believe all relevant info is included. Interpreting this trace, it appears to me that the UDP socket starts out fine, and shows kevents happening as expected. There is a kevent WRITE event fired after the first batch of UDP packets are sent off. (This is also the last WRITE event I see in the log for file descriptor 8.) Later, this user sees an EAGAIN returned by sendto (
BRAD: sendto returned EAGAIN
). After that, the log shows some kevents occurring against file descriptor 8, but they're all _EVFILT_READ, never _EVFILT_WRITE.With some further debugging steps, we found that (at least with this user who is seeing the issue), when we prevented the program from attempting any UDP writes to the ULA 7c::/7 range, the issue did not manifest.
In an earlier incarnation of a similar issue (linked above), the program Little Snitch was installed on the failing machines. Those machines failed with the same stack. In attempting to root cause this issue, I re-installed Little Snitch and successfully reproduced the lockup. I saw the same behavior as indicated in the instrumented logs above - a sendto call returns EAGAIN, but no WRITE kevent ever occurs to wake it up again.
In this case, the users encountering the issue do not have Little Snitch installed.
What did you expect to see?
I expect the UDP write to not block forever.
If it's guaranteed that kevent() fires a WRITE event for a file descriptor in every scenario in which sendto returns EAGAIN, then the behavior I'm seeing would appear to be a Darwin bug. I've seen this behavior twice and on different o/s versions - once with Little Snitch installed, and this new incarnation reported by a few users (apparently related to UDP writes to ULA IPv6 addresses.)
Even if it's the case that darwin is to blame, I would love to have the ability to work around this behavior. One thought is to make UDP writes non-blocking from a golang perspective. That is, the socket itself is nonblocking, but golang's
net.(*UDPConn).WriteToUDPAddrPort
call will block until sendto returns something other than EINTER and EAGAIN. If I had a way to callWriteToUDPAddrPort
(or another udp send function) in a way such that at least the EAGAIN is immediately bubbled up the stack instead of blocking for a kevent that never happens, then I could work around this kevent behavior without locking up the process.The text was updated successfully, but these errors were encountered: