Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource leak if a service spontaneously stops #81

Open
Hawk777 opened this issue Sep 25, 2022 · 7 comments
Open

Resource leak if a service spontaneously stops #81

Hawk777 opened this issue Sep 25, 2022 · 7 comments

Comments

@Hawk777
Copy link

Hawk777 commented Sep 25, 2022

According to the Microsoft documentation regarding valid state transitions in services, the transition from RUNNING to STOP_PENDING and the transition from STOP_PENDING to STOPPED can both be initiated by the service. That is to say, it appears to be legal for a service to spontaneously decide to stop, rather than stopping in response to a control from the SCM (this makes sense if it fails to initialize, for example).

Should a service do this, by doing a set_service_status to the Stopped state and then returning from its ServiceMain function, when using the windows-service crate, the control handler will not be invoked with ServiceControl::Stop, therefore the let _: Box<F> = unsafe { Box::from_raw(context as *mut F) }; line will never execute and the context object will never be dropped.

I don’t think it’s possible to fix this without resorting to a global variable, unfortunately (it’s not safe to drop the context object in ServiceMain because a race condition could see the control handler being invoked at the same time and accessing the dropped object; I have experimented and determined that there is no mutex in the Windows API between SetServiceStatus and invocations of the control handler that could potentially prevent this from happening). IMO it is reasonable to keep the current behaviour, but I think it should be documented somewhere. For example, the documentation for windows_service::service_control_handler::register could mention that, should the service stop spontaneously (rather than in response to a ServiceControl::Stop, Shutdown, or Preshutdown), the FnMut (and anything captured in it, assuming it’s a closure) might be leaked instead of dropped. This is unlikely to be a problem in most cases, but if this crate ever expands to include multi-service processes, it might be a bit more important since we can’t rely on process death to clean up resources (and in any case it’s worth documenting since people might have Drop impls that do something important that they’re expecting to run).

@pronebird
Copy link
Contributor

pronebird commented Sep 26, 2022

That's a valid point.

I re-read the documentation for SetServiceStatus today and it makes this one statement:

If the status is SERVICE_STOPPED, perform all necessary cleanup and call SetServiceStatus one time only. This function makes an LRPC call to the SCM. The first call to the function in the SERVICE_STOPPED state closes the RPC context handle and any subsequent calls can cause the process to crash.

Which hints that it should be safe to release context once the SERVICE_STOPPED is passed to SetServiceStatus. Also once that happens dwControlsAccepted should hint that no further control events can be received and so SCM should not attempt to dispatch any further events.

So you reckon that the race between SetServiceStatus( /* stopped */) and control handler is possible and hence some sort of mediator is needed to be used to manage context? I imagine we could pass an index instead of boxed closure and then use some global object to access that box by index. This is not very pretty though.

Sorry If I have to ask twice, because it seems like a fundamental flaw in windows services API if that's the case.

@Hawk777
Copy link
Author

Hawk777 commented Sep 26, 2022

I re-read the documentation for SetServiceStatus today and it makes this one statement:

Yes, I read that paragraph too. However, the problem is that the control handler might already have been called by the time the ServiceMain thread calls SetServiceStatus(SERVICE_STOPPED). The ServiceMain thread can’t magically stop an in-progress function execution on another thread. This also reveals why it is impossible to solve the problem by adding a mutex on the user side in the context object: there will always be a time window after the control handler is called and before it gets around to taking the mutex where the ServiceMain thread could stop the service and deallocate the context, and then the control handler would try to lock a mutex that has been destroyed.

What I would hope would happen, is that the Windows API has something like a recursive mutex between SetServiceStatus and invocations of the control handler, so that SetServiceStatus(SERVICE_STOPPED) won’t return until the control handler is no longer executing. Unfortunately, by experimentation, that is not the case—it’s easy to make a control handler that sleeps for a while, and then arrange for the service to decide to spontaneously stop while the control handler is sleeping. SetServiceStatus(SERVICE_STATUS_STOPPED) in the ServiceMain thread due to spontaneous termination completes immediately, even while the control handler is in the middle of executing (sleeping).

Sorry If I have to ask twice, because it seems like a fundamental flaw in windows services API if that's the case.

Yes, I know it’s surprising, and AFAICT it is a fundamental flaw in the API. I’ve experimented a bit and can’t figure out a way around it. Of course there’s no problem if the service doesn’t spontaneously terminate (i.e. if it only ever terminates in response to SERVICE_CONTROL_STOP), because then the control handler knows it will be invoked and can clean up the callback.

The examples in MSDN aren’t very helpful, because they all show simple services that just hold HANDLEs to event objects in global variables, which obviously doesn’t expose this problem.

@pronebird
Copy link
Contributor

pronebird commented Sep 28, 2022

Thanks for thorough investigation on this matter. I am still wrapping my head around it.

Ideally we'd probably want to rely on Drop semantics and release the context automatically with the service shutdown. But lifetime requirement is a pickle.

Control handler function could perhaps rely on global object to synchronize access to and execute FnMut by index (which could be ever-incrementing) and do nothing when context is already released.

@Hawk777
Copy link
Author

Hawk777 commented Sep 28, 2022

Yes, a global holding something like a Mutex<Option<Box<dyn FnMut>>> would do the job. You could fill it in register and then drop it in the control handler on SERVICE_CONTROL_STOP and co, just like now, but also drop it when ServiceMain returns if it hasn’t been already. It’s an ugly global though, and would have to get even more complicated for processes that host multiple services since you’d need a table of them with a separate one for each service.

The index is a good idea; I hadn’t thought of that, but now I have a slightly different idea about how you could use it: the context pointer passed to the control handler could be an integer cast to pointer rather than an actual pointer. That could be used as a table index for a multi-service process.

I don’t know whether an ever-incrementing index is needed. I assume the reason for that would be to prevent a second run of a service from using the first run’s control handler? I think that’s not really a problem; when the new service starts, it will allocate a fresh shared state (probably Arc<Mutex<Whatever>>) and capture it in the closure for register. Before the call to register the old control handler could still be invoked but it would use the first run’s Arc<Mutex<Whatever>>, not the second’s, so it wouldn’t interfere with the second run. If register takes the mutex in the global, then you can guarantee that (1) once register returns, the first run’s control handler is no longer in progress (via the mutex, assuming you hold it for the duration of the control handler closure), and (2) once register returns, the first run’s control handler won’t be invoked again (because it’s not there any more, the table entry now points at the second control handler). That might be sufficient.

@pronebird
Copy link
Contributor

pronebird commented Sep 29, 2022

Yes, a global holding something like a Mutex<Option<Box>> would do the job. You could fill it in register and then drop it in the control handler on SERVICE_CONTROL_STOP and co, just like now, but also drop it when ServiceMain returns if it hasn’t been already. It’s an ugly global though, and would have to get even more complicated for processes that host multiple services since you’d need a table of them with a separate one for each service

Correct me if I am wrong, but I don't think ServiceMain is obliged to be blocking. The service could spawn a secondary thread and do all processing there and return from ServiceMain. In that regard dropping the control event handler before it returns is probably wrong. StartServiceCtrlDispatcherW is blocking the current thread though, but that's different.

The index is a good idea; I hadn’t thought of that, but now I have a slightly different idea about how you could use it: the context pointer passed to the control handler could be an integer cast to pointer rather than an actual pointer. That could be used as a table index for a multi-service process.

Yeah that's exactly my idea. There is no memory management issue in that case.

I don’t know whether an ever-incrementing index is needed. I assume the reason for that would be to prevent a second run of a service from using the first run’s control handler?

Exactly.

I think that’s not really a problem; when the new service starts, it will allocate a fresh shared state (probably Arc<Mutex>) and capture it in the closure for register. Before the call to register the old control handler could still be invoked but it would use the first run’s Arc<Mutex>, not the second’s, so it wouldn’t interfere with the second run. If register takes the mutex in the global, then you can guarantee that (1) once register returns, the first run’s control handler is no longer in progress (via the mutex, assuming you hold it for the duration of the control handler closure), and (2) once register returns, the first run’s control handler won’t be invoked again (because it’s not there any more, the table entry now points at the second control handler). That might be sufficient.

Not sure I follow you here. If the context to control event handler is an index to global array of Box<FnMut> then I suppose in a shared service we have to account for situations when the service goes down and then back up again and the process remains the same (i.e not restarted). In that case I assume that ServiceMain will be called twice. I was thinking something along these lines which is similar to what you described:

type ControlHandler = FnMut(ServiceControl) -> ServiceControlHandlerResult + 'static + Send

static let store: Arc<Mutex<Vec<Option<Box<ControlHandler>>>>> = Arc::new(Mutex::new(Vec::new()))

fn store_control_handler(event_handler: ControlHandler) -> usize {
  let shared_data = Arc::clone(&store);
  let mut data = shared_data.lock().unwrap();
  *data.push(Some(Box::new(event_handler));
  *data.len() - 1
}

fn reset_control_handler(index: usize) {
  let shared_data = Arc::clone(&store);
  let mut data = shared_data.lock().unwrap();
  *data[index] = None;
}

fn call_control_handler(index: usize, code: ServiceControl) -> ServiceControlHandlerResult {
  let shared_data = Arc::clone(&store);
  let data = shared_data.lock().unwrap();
  match *data[index] {
    Some(boxed_event_handler) => boxed_event_handler(code)
    None => ServiceControlHandlerResult::NoError
  }
}

pub fn register<S, F>(service_name: S, event_handler: ControlHandler) -> Result<ServiceStatusHandle>
where  S: AsRef<OsStr>
{
    let mut event_handler_index = store_control_handler(event_handler)

    let service_name =
        WideCString::from_str(service_name).chain_err(|| ErrorKind::InvalidServiceName)?;
    let status_handle = unsafe {
        winsvc::RegisterServiceCtrlHandlerExW(
            service_name.as_ptr(),
            Some(service_control_handler::<F>),
            event_handler_index as *mut c_void,
        )
    };

    if status_handle.is_null() {
        reset_control_handler(event_handler_index);
        Err(io::Error::last_os_error().into())
    } else {
        Ok(ServiceStatusHandle::from_handle(status_handle))
    }
}

/// Static service control handler
#[allow(dead_code)]
extern "system" fn service_control_handler(
    control: u32,
    _event_type: u32,
    _event_data: *mut c_void,
    context: *mut c_void,
) -> u32 {
    // Important: cast context to &mut F without taking ownership.
    let event_handler_index: usize = unsafe { context as usize };

    match ServiceControl::from_raw(control) {
        Ok(service_control) => {
            let return_code = call_control_handler(event_handler_index , service_control).to_raw();

            // Important: release context upon Stop, Shutdown or Preshutdown at the end of the
            // service lifecycle.
            match service_control {
                ServiceControl::Stop | ServiceControl::Shutdown | ServiceControl::Preshutdown => {
                   reset_control_handler(event_handler_index)
                }
                _ => (),
            };

            return_code
        }

        // Report all unknown control commands as unimplemented
        Err(_) => ServiceControlHandlerResult::NotImplemented.to_raw(),
    }
}

But I haven't touched Rust in a while so disregard me if I am talking non-sense. :-)

@Hawk777
Copy link
Author

Hawk777 commented Oct 1, 2022

Correct me if I am wrong, but I don't think ServiceMain is obliged to be blocking. The service could spawn a secondary thread and do all processing there and return from ServiceMain. In that regard dropping the control event handler before it returns is probably wrong. StartServiceCtrlDispatcherW is blocking the current thread though, but that's different.

You’re correct, of course. ServiceMain can return any time it wants, even when the service isn’t stopping. So that doesn’t help.

Not sure I follow you here. If the context to control event handler is an index to global array of Box<FnMut> then I suppose in a shared service we have to account for situations when the service goes down and then back up again and the process remains the same (i.e not restarted). In that case I assume that ServiceMain will be called twice. I was thinking something along these lines which is similar to what you described:

Ah, the point I was trying to make is that, I am reasonably sure that:

  1. StartServiceCtrlDispatcherW is the thing that calls the control handler, and
  2. StartServiceCtrlDispatcherW is also the thing that spawns threads to run ServiceMain.

Therefore these two actions are interlocked with each other. So I think there is no chance that the old instance’s control handler could be running, after the new instance’s ServiceMain gets spawned. So that means you should only need one slot per service, not per service instance. So, when service number 0 is started the second time, it can reuse the slot that service number 0 used for its first startup. That should be safe on the control handler side because the first launch’s control handler won’t run any more, and it should be safe on the ServiceMain side because ServiceMain never accesses the shared state via index numbers, it would hold an Arc<Mutex<SharedState>> or something shared with the control handler closure instead.

Does that makes sense?

@pronebird
Copy link
Contributor

@Hawk777 makes sense. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants