[CRITICAL] waitpid on non-child processes in stop_and_wait #17
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
stop_and_waitinapi.rscallswaitpid(Pid::from_raw(pid as i32), WNOHANG)on processes that were not spawned by the current server instance. This happens for services restored afterprepare_restart.waitpidonly works on direct children of the calling process. For non-child processes, it returnsECHILDimmediately. The code treatsECHILDas "process already reaped" (setsreaped = true), which meansstop_and_waitskips SIGKILL and returns success for a still-running process.Impact
During cascade stops for
remove_serviceor similar operations, restored services that are actually running are incorrectly reported as stopped. The calling code proceeds to delete the service while the process is still alive, leaving an orphan.Files
crates/my_init_server/src/supervisor/api.rs--stop_and_waitmethodSuggested Fix
Use
process_exists(pid)polling instead ofwaitpidfor non-child processes. Alternatively, track which processes were spawned by the current instance vs restored, and use different wait strategies.Confirmed by code inspection at crates/my_init_server/src/supervisor/api.rs:327-346. stop_and_wait calls waitpid(WNOHANG) on any process including restored (non-child) ones. waitpid on non-children returns ECHILD, which is treated as "already reaped" at line 336-338, causing skip of SIGKILL. The process continues running while the supervisor believes it is stopped. Only affects restored services after prepare_restart.