Bug #19700
closedTestProcess#test_execopts_redirect_open_fifo_interrupt_print is flaky on macOS
Description
The test TestProcess#test_execopts_redirect_open_fifo_interrupt_print
in test_process.rb
is flaky on macOS for me. Sometimes, it just hangs forever.
This test is testing what happens when:
- You have two processes
- One is blocked opening a FIFO for reading (to redirect to a child process)
- The other one sends a signal to that process
- And then opens the FIFO for writing (which should unblock the child process)
When this test hangs forever, for me,
- The child process is blocked on opening the FIFO (i.e. it's waiting for a writer)
- But the parent process successfully wrote data to the FIFO already somehow (this shouldn't be possible - the parent should have been blocked opening the FIFO until the open succeeded in the child).
Actually I believe this is a bug in macOS. The following program will fail with write (parent): Broken pipe
on macOS when run in a loop, but works correctly on Linux: https://gist.github.com/KJTsanaktsidis/fc84b006cfff1bb0b55a2571df825d80
I'm going to open a PR to skip this test on macos because I believe the operating system is broken here, and as far as I can tell Ruby is doing the correct thing.
Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 2 years ago
I opened https://github.com/ruby/ruby/pull/7876 to skip this.
Updated by nobu (Nobuyoshi Nakada) about 2 years ago
- Status changed from Open to Feedback
I haven't seen that failures on macOS.
ProductName: macOS
ProductVersion: 13.4
BuildVersion: 22F66
Also your test program runs fine.
Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 2 years ago
I think it's the same failure as these:
- http://rubyci.s3.amazonaws.com/osx1200arm/ruby-master/log/20230601T215005Z.fail.html.gz
- http://rubyci.s3.amazonaws.com/osx1200arm/ruby-master/log/20230531T165005Z.fail.html.gz
Is it an arm thing perhaps? My macbook is an M1 on 13.4 (22F66) as well.
I was going to suggest it could have something to do with the CrowdStrike security stack on my mac (work machine), which definitely gets its hooks into open(2) system calls, but it seems we have a similar failure on RubyCI (which I assume doesn't have any of that stuff on it?). Maybe CrowdStrike makes the flakiness more likely but it's present regardless?
I'll do a bit of a survey of macs around the office this week and see if I can find a differentiating factor with that test program.
Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 2 years ago
It seems from my survey around the office that my test program works on Intel macs and crashes on ARM ones. I opened a bug report with Apple about this (FB12251512)