Skip to content

fix: notify and cancel windows#245

Merged
edwards-aws merged 2 commits intoOpenJobDescription:mainlinefrom
edwards-aws:notify_cancel
Jul 4, 2025
Merged

fix: notify and cancel windows#245
edwards-aws merged 2 commits intoOpenJobDescription:mainlinefrom
edwards-aws:notify_cancel

Conversation

@edwards-aws
Copy link
Contributor

Fixes: aws-deadline/deadline-cloud-worker-agent#490

What was the problem/requirement? (What/Why)

The notify and cancel workflow was failing on Windows because the process sending the signal couldn't attach to the main process console in Session 0

What was the solution? (How)

Create the signal process using LoggingSubprocess, mostly so the signal process can run as the user of the process being cancelled.

What is the impact of this change?

Tasks on Windows can now properly clean up after they've been notified instead of crashing.

How was this change tested?

Ran the cross user tests on Windows in and out of Session 0 (using the service script)
Ran the sudo_test.sh and sudo_test.sh --ldap tests on Ubuntu.
Also ran the tests normally on Windows and Ubuntu.

Ran a worker agent locally in session zero using my changes. Created a long sleep job, cancelled it, noticed that it cancelled correctly.

2025/06/11 17:06:59-05:00 openjd_progress: 8.799999999999985
2025/06/11 17:07:00-05:00 openjd_progress: 8.899999999999984
2025/06/11 17:07:01-05:00 openjd_progress: 8.999999999999984
2025/06/11 17:07:02-05:00 openjd_progress: 9.099999999999984
2025/06/11 17:07:03-05:00 Canceling subprocess 11100 via notify then terminate method at 2025-06-11T22:07:03Z.
2025/06/11 17:07:03-05:00 Grace period ends at 2025-06-11T22:07:25Z
2025/06/11 17:07:03-05:00 INTERRUPT: Sending CTRL_BREAK_EVENT to 11100
2025/06/11 17:07:03-05:00 Running command "c:\program files\python312\python.exe" "C:\Program Files\Python312\Lib\site-packages\openjd\sessions\_scripts\_windows\_signal_win_subprocess.py" 11100
2025/06/11 17:07:03-05:00 Command started as pid: 4120
2025/06/11 17:07:03-05:00 Output:
2025/06/11 17:07:03-05:00 Process pid 4120 exited with code: 0 (unsigned) / 0x0 (hex)
2025/06/11 17:07:03-05:00 Trapped signal 21
2025/06/11 17:07:03-05:00 CANCELING IN 2s...
2025/06/11 17:07:05-05:00 Process pid 11100 exited with code: 1 (unsigned) / 0x1 (hex)
  • Have you run the unit tests?
    Yes

I'll note that I had to refactor the service tests a bit. I had to call pytest as a module instead of a subprocess. This is because when it ran as subprocess it was running as python.exe and not pythonservicemanager.exe. This seems to mess with how permissions are inherited. But the tests also now more closely emulate how the library would run as a service in the real world.

An unfortunate side effect is that when running as a service sys.executable evaluates to pythonservicemanager.exe instead of python.exe. We use sys.executable a lot in the tests expecting a regular python runtime. So I had to do a refactor so that pythonservicemanager gets swapped for python where appropriate.

Was this change documented?

  • Are relevant docstrings in the code base updated?

Is this a breaking change?

Nope

Does this change impact security?

No


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Signed-off-by: Cody Edwards <edwards@amazon.com>
@edwards-aws edwards-aws requested a review from a team as a code owner June 11, 2025 22:31
@jusiskin
Copy link
Contributor

An unfortunate side effect is that when running as a service sys.executable evaluates to pythonservicemanager.exe instead of python.exe. We use sys.executable a lot in the tests expecting a regular python runtime. So I had to do a refactor so that pythonservicemanager gets swapped for python where appropriate.

There is precedent for this already in the repository, so this is fine. The most well-proven way we know of to run Python as a service is using pywin32, so I think we need to accommodate this use-case.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jul 4, 2025

@edwards-aws edwards-aws merged commit 77e5e2b into OpenJobDescription:mainline Jul 4, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: NOTIFY_THEN_TERMINATE cancelation mode on Windows fails to send CTRL_BREAK_EVENT

3 participants