Win32 Service Gotcha: Recovery Actions• Comments
Win32 Services and Recovery Actions
This blog post doesn’t have to do with
BackgroundService specifically, but it is an issue that can come up with .NET Core workers that are run as Win32 Services. In some ways, this blog post has more to do with managed services, but I decided to put it with the
BackgroundService series because it is a problem with
BackgroundServices run as Win32 services.
Background: Recovery Actions
The Win32 Service Control Manager (SCM) is responsible for starting and stopping services on Windows machines. It’s also responsible for restarting Win32 services when they fail:
However, it can be a bit confusing to think about what “fail” actually means.
Background: Win32 Service Failure
It’s pretty clear that if a Win32 application crashes, that indicates “failure”. Normally, Win32 services communicate with the SCM and let it know what their state is. The most common states are “stopped” and “started”, along with transitional states like “stopping” and “starting”. So, if a Win32 application exits (or crashes) without telling the SCM it is “stopped”, then the SCM treats that as a failure.
What’s much less clear is how exit codes are handled.
The first thing to keep in mind is that each Win32 service has its own exit code. A single Win32 process can contain multiple different Win32 services within that single process, and each of those Win32 services has its own exit code. As far as I can tell, the exit code of the process itself is completely ignored.
What’s more, if the Win32 service does report that it is “stopped” to the SCM, the SCM will ignore the Win32 service exit code, too! The SCM assumes that if the service has reported it is “stopped”, then the service has stopped successfully, and there is no need to restart the service.
This means that if you have a Win32 service and it reports a non-zero exit code (either for the process exit code or the Win32 service exit code), and if that Win32 service exits cleanly after setting its non-zero exit code, then that exit code will be ignored and the service will not be restarted.
Tip: Honoring Win32 Service Exit Codes
There is a flag you can set that will cause SCM to honor the Win32 service exit code, treating a non-zero code as a “failure” and running its recovery actions. You can turn this flag on at the command line as such:
Setting that flag checks this checkbox, which has the rather difficult-to-understand wording of “Enable actions for stops with errors.”:
Win32 Service Exit Codes and ServiceBase
For .NET applications, the
Environment.ExitCode property manages the exit code for the process. As far as I know, this value is always ignored when the process is run as a service. The Win32 service exit code is managed by the
ServiceBase.ExitCode property. Remember, there can be multiple Win32 services in a single process.
Win32 Service Exit Codes and WindowsServiceLifetime
In .NET Core applications that are run as Win32 services, it’s normal to call
IHostBuilder.UseWindowsService(), which installs a
WindowsServiceLifetime as the
IHostLifetime, instead of the default
WindowsServiceLifetime lets the SCM control the starting and stopping of the .NET Core application. It derives from
ServiceBase. Since multiple
IHostLifetime instances aren’t supported, this means that .NET Core workers do not naturally support multiple Win32 services in a single process. It may be possible to support that by creating a new type that derives from
IHostedService, along with some kind of coordinating
IHostLifetime implementation, but I’m not aware of anyone doing that yet. For now, all the .NET Core Win32 services I know of use
One important note is that
WindowsServiceLifetime stops cleanly. If the .NET Core application is shutdown, then
WindowsServiceLifetime reports to the SCM that the service is “stopped”, and this means that the SCM will not restart the service.
You can write code that will set the Win32 service exit code on failure by accessing
ServiceBase.ExitCode as such:
However, as noted above, you have to also set the
failureflag on the service, or else the SCM will ignore the non-zero exit code.
If desired, you can write a custom
WindowsServiceLifetime that will treat non-zero
Environment.ExitCode values as non-zero Win32 service exit codes:
This can be installed by adding this service (
services.AddSingleton<IHostLifetime, MyWindowsServiceLifetime>()) after calling
UseWindowsService; the .NET Core dependency injection will just take the last registered
It is also possible to create a custom derived
WindowsServiceLifetime that can detect application failures and will prevent
ServiceBase from sending the “stopped” message to the SCM. That way, your service will be restarted regardless of the
failureflag setting, because any failures will cause the process to crash.
We don’t want to crash the process immediately; instead, we want to shut down the .NET Core host and then terminate, so that the SCM knows the process failed. The
WindowsServiceLifetime type has some similar behavior: if the SCM requests the service to stop, then it will shut down the .NET Core host and wait for it to stop. We can do the same thing in our custom lifetime type:
This kind of
MyWindowsServiceLifetime implementation will work whether or not you set the
failureflag on your service.