Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown more gracefully #633

Open
3 tasks done
sc68cal opened this issue Dec 20, 2023 · 4 comments
Open
3 tasks done

Shutdown more gracefully #633

sc68cal opened this issue Dec 20, 2023 · 4 comments

Comments

@sc68cal
Copy link
Contributor

sc68cal commented Dec 20, 2023

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that ansible-rulebook is open source software provided for free and that I might not receive a timely response.

Feature type

Enhancement to Existing Feature

Feature Summary

Currently, there doesn't seem to be a way to get ansible-rulebook to stop processing new events from event sources and allow all existing work to complete. Even using the shutdown action and a webhook, other rules will continue to pull work from the even sources and start kicking off playbook runs. This means that there's a high likelihood that ansible-rulebook will exit while a playbook is still executing.

There needs to be a way to have ansible-rulebook stop all the event sources, allow all work to complete, and then exit.

This would also be useful when handling SIGTERM signals. Allow the work to complete, then exit, instead of just exiting hard with work still in progress.

Steps to reproduce

Create a rulebook that contains an event source that is a queue, like the Kafka event source or Amazon SQS, and also has a webhook source that has a rule that matches the webhook and does a shutdown action.

Put an event into the Kafa or SQS queue, watch the playbook start running. Send an event to the webhook

Current results

Work continues up until the shutdown happens, with no real "graceful" behavior

Sugested feature result

Ability to stop ansible-rulebook cleanly, where all work that has been pulled from an event source has completed, while stopping pulling from an event source and allow work to queue up in event sources where this construct exists (SQS queue, etc).

Additional information

No response

@sc68cal
Copy link
Contributor Author

sc68cal commented Dec 20, 2023

Part of this is based on some of my work developing an event source for RabbitMQ, where we have work that is placed in a queue, where each message is a piece of work that needs to be done. Ideally, if we do an upgrade or make a change to the system, we need a way to have ansible-rulebook stop pulling messages from the queue, complete the work that it has already pulled from the queue, and then let more work just pile up in the queue while we do an upgrade or change, and then re-run ansible-rulebook.

Obviously there's a chance that I have just not written the event source plugin for RabbitMQ well enough to handle this, but I coded it while following the Kafka and Amazon SQS examples of event sources and it appears that those event sources would suffer from the same issue.

@sc68cal
Copy link
Contributor Author

sc68cal commented Dec 20, 2023

Here's an example log:

TASK [Create ipv6 subnets on newly created neutron network] ********************
task path: /private/var/folders/kl/56c62hm51dd742lfbstbchgc0000gp/T/eda_y7oy6_5/project/create_tsf_tenant_network.yml:197
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: sc68cal
<localhost> EXEC /bin/sh -c 'echo ~sc68cal && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /Users/sc68cal/.ansible/tmp `"&& mkdir "` echo /Users/sc68cal/.ansible/tmp/ansible-tmp-1703103690.415442-10800-137437083787705 `" && echo ansible-tmp-1703103690.415442-10800-137437083787705="` echo /Users/sc68cal/.ansible/tmp/ansible-tmp-1703103690.415442-10800-137437083787705 `" ) && sleep 0'
Using module file /Users/sc68cal/.ansible/collections/ansible_collections/openstack/cloud/plugins/modules/subnet.py
<localhost> PUT /Users/sc68cal/.ansible/tmp/ansible-local-10626jge_lkf4/tmpx4hkxnwp TO /Users/sc68cal/.ansible/tmp/ansible-tmp-1703103690.415442-10800-137437083787705/AnsiballZ_subnet.py
<localhost> EXEC /bin/sh -c 'chmod u+x /Users/sc68cal/.ansible/tmp/ansible-tmp-1703103690.415442-10800-137437083787705/ /Users/sc68cal/.ansible/tmp/ansible-tmp-1703103690.415442-10800-137437083787705/AnsiballZ_subnet.py && sleep 0'
<localhost> EXEC /bin/sh -c '/Users/sc68cal/src/work/DCN/openstack-tsf-network-controller/.venv/bin/python3.11 /Users/sc68cal/.ansible/tmp/ansible-tmp-1703103690.415442-10800-137437083787705/AnsiballZ_subnet.py && sleep 0'
2023-12-20 15:21:31,118 - ansible_rulebook.action.runner - DEBUG - Ansible Runner Thread Pool executor task cancelled
2023-12-20 15:21:31,118 - ansible_rulebook.action.runner - DEBUG - Cancel Queue reading task
2023-12-20 15:21:31,118 - ansible_rulebook.action.runner - INFO - Ansible runner Queue task cancelled
2023-12-20 15:21:31,472 - ansible_rulebook.rule_set_runner - DEBUG - Action task caught Cancelled error
2023-12-20 15:21:31,472 - ansible_rulebook.rule_set_runner - INFO - Task action::run_playbook::Rulebook for webhook::tsf network create rabbitmq event finished, active actions 1
2023-12-20 15:21:31,474 - ansible_rulebook.rule_set_runner - INFO - Task Task-1168 finished, active actions 0
2023-12-20 15:21:31,474 - ansible_rulebook.rule_set_runner - DEBUG - Action Plan Task Cancelled for ruleset Rulebook for webhook
2023-12-20 15:21:31,474 - ansible_rulebook.rule_set_runner - INFO - Cleaning up ruleset Rulebook for webhook
2023-12-20 15:21:31 519 [main] INFO org.drools.ansible.rulebook.integration.api.RulesExecutor - Disposing session with id: 1; SessionStats{start='2023-12-20T20:19:51.695686Z', end='2023-12-20T20:21:31.491585Z', lastClockTime='2023-12-20T20:21:31.396Z', clockAdvanceCount=997, numberOfRules=2, numberOfDisabledRules=0, rulesTriggered=12, eventsProcessed=12, eventsMatched=12, eventsSuppressed=0, permanentStorageCount=0, permanentStorageSize=0, asyncResponses=0, bytesSentOnAsync=0, sessionId=1, ruleSetName='Rulebook for webhook', lastRuleFired='tsf network create rabbitmq event', lastRuleFiredAt='2023-12-20T20:21:15.096Z'}
2023-12-20 15:21:31,527 - ansible_rulebook.rule_set_runner - INFO - {'asyncResponses': 0,
 'bytesSentOnAsync': 0,
 'clockAdvanceCount': 997,
 'end': '2023-12-20T20:21:31.491585Z',
 'eventsMatched': 12,
 'eventsProcessed': 12,
 'eventsSuppressed': 0,
 'lastClockTime': '2023-12-20T20:21:31.396Z',
 'lastRuleFired': 'tsf network create rabbitmq event',
 'lastRuleFiredAt': '2023-12-20T20:21:15.096Z',
 'numberOfDisabledRules': 0,
 'numberOfRules': 2,
 'permanentStorageCount': 0,
 'permanentStorageSize': 0,
 'ruleSetName': 'Rulebook for webhook',
 'rulesTriggered': 12,
 'sessionId': 1,
 'start': '2023-12-20T20:19:51.695686Z'}
2023-12-20 15:21:31,527 - ansible_rulebook.engine - INFO - Cancelling all ruleset tasks
2023-12-20 15:21:31,527 - ansible_rulebook.engine - INFO - Waiting on gather
2023-12-20 15:21:31,527 - ansible_rulebook.engine - INFO - Returning from run_rulesets
2023-12-20 15:21:31,528 - ansible_rulebook.app - INFO - Cancelling event source tasks
2023-12-20 15:21:31,528 - drools.dispatch - DEBUG - Shutting down async channel
2023-12-20 15:21:31,530 - aiorabbit.client - DEBUG - Writing frame: <Basic.Cancel object at 0x1071e9d60>
2023-12-20 15:21:31,530 - aiorabbit.client - DEBUG - Set state to 0x73: Cancelling Consumer while state is 0x71: Sent message acknowledgement - {83: {}, 34: {}, 81: {}, 119: {}} [None]
2023-12-20 15:21:31,530 - aiorabbit.client - DEBUG - Transition to 0x73: Cancelling Consumer from 0x71: Sent message acknowledgement after 16.3337 seconds
2023-12-20 15:21:31,530 - aiorabbit.client - DEBUG - Waiter 2366020287954541 waiting on (116: Consumer cancelled || 34: Channel Close Received) while in 0x73: Cancelling Consumer
2023-12-20 15:21:31,530 - <run_path> - INFO - Webhook Plugin Task Cancelled
2023-12-20 15:21:31,530 - ansible_rulebook.engine - INFO - Broadcast shutdown to all source plugins
2023-12-20 15:21:31,531 - ansible_rulebook.engine - INFO - Broadcast to queues: [<Queue at 0x1072f6dd0 maxsize=1 _queue=[Shutdown(message='Shutting down', delay=15, kind='graceful', source_plugin='')] tasks=14>, <Queue at 0x1072f6dd0 maxsize=1 _queue=[Shutdown(message='Shutting down', delay=15, kind='graceful', source_plugin='')] tasks=14>]
2023-12-20 15:21:31,531 - ansible_rulebook.engine - INFO - Broadcasting shutdown: Shutdown(message='Source ansible.eda.webhook initiated shutdown at 2023-12-20 15:21:31.530972', delay=300.0, kind='graceful', source_plugin='ansible.eda.webhook')
2023-12-20 15:21:31,532 - aiorabbit.client - DEBUG - Set state to 0x74: Consumer cancelled while state is 0x73: Cancelling Consumer - {83: {}, 34: {2366020287954541: <asyncio.locks.Event object at 0x1072b3dd0 [unset]>}, 81: {}, 119: {}, 116: {2366020287954541: <asyncio.locks.Event object at 0x1072b3510 [unset]>}} [None]
2023-12-20 15:21:31,532 - aiorabbit.client - DEBUG - Transition to 0x74: Consumer cancelled from 0x73: Cancelling Consumer after 0.0021 seconds
2023-12-20 15:21:31,532 - aiorabbit.client - DEBUG - Waiter 2366020287954541 wait on 0x74: Consumer cancelled has finished [None]
2023-12-20 15:21:31,532 - aiorabbit.client - DEBUG - Writing frame: <Channel.Close object at 0x107b3dfd0>
2023-12-20 15:21:31,533 - aiorabbit.client - DEBUG - Set state to 0x23: Channel Close Sent while state is 0x74: Consumer cancelled - {83: {}, 34: {}, 81: {}, 119: {}, 116: {}} [None]
2023-12-20 15:21:31,533 - aiorabbit.client - DEBUG - Transition to 0x23: Channel Close Sent from 0x74: Consumer cancelled after 0.0003 seconds
2023-12-20 15:21:31,533 - aiorabbit.client - DEBUG - Waiter 2366020290472125 waiting on (36: Channel CloseOk Received || 34: Channel Close Received) while in 0x23: Channel Close Sent
2023-12-20 15:21:31,533 - aiorabbit.client - DEBUG - Set state to 0x24: Channel CloseOk Received while state is 0x23: Channel Close Sent - {83: {}, 34: {2366020290472125: <asyncio.locks.Event object at 0x107399f10 [unset]>}, 81: {}, 119: {}, 116: {}, 36: {2366020290472125: <asyncio.locks.Event object at 0x107b37710 [unset]>}} [None]
2023-12-20 15:21:31,533 - aiorabbit.client - DEBUG - Transition to 0x24: Channel CloseOk Received from 0x23: Channel Close Sent after 0.0005 seconds
2023-12-20 15:21:31,534 - aiorabbit.client - DEBUG - Waiter 2366020290472125 wait on 0x24: Channel CloseOk Received has finished [None]
2023-12-20 15:21:31,534 - aiorabbit.client - DEBUG - Set state to 0x102: Closing while state is 0x24: Channel CloseOk Received - {83: {}, 34: {}, 81: {}, 119: {}, 116: {}, 36: {}} [None]
2023-12-20 15:21:31,534 - aiorabbit.client - DEBUG - Transition to 0x102: Closing from 0x24: Channel CloseOk Received after 0.0007 seconds
2023-12-20 15:21:31,534 - aiorabbit.channel0 - DEBUG - Set state to 0x20: Connection Close Sent while state is 0x18: Heartbeat Sent - {22: {}, 34: {}} [None]
2023-12-20 15:21:31,534 - aiorabbit.channel0 - DEBUG - Transition to 0x20: Connection Close Sent from 0x18: Heartbeat Sent after 69.0487 seconds
2023-12-20 15:21:31,534 - aiorabbit.channel0 - DEBUG - Waiter 2366020291746916 waiting on (33: Connection CloseOk Received) while in 0x20: Connection Close Sent
2023-12-20 15:21:31,534 - aiorabbit.channel0 - DEBUG - Set state to 0x21: Connection CloseOk Received while state is 0x20: Connection Close Sent - {22: {}, 34: {}, 33: {2366020291746916: <asyncio.locks.Event object at 0x107b37710 [unset]>}} [None]
2023-12-20 15:21:31,534 - aiorabbit.channel0 - DEBUG - Transition to 0x21: Connection CloseOk Received from 0x20: Connection Close Sent after 0.0004 seconds
2023-12-20 15:21:31,535 - aiorabbit.channel0 - DEBUG - Waiter 2366020291746916 wait on 0x21: Connection CloseOk Received has finished [None]
2023-12-20 15:21:31,535 - aiorabbit.client - DEBUG - Set state to 0x103: Closed while state is 0x102: Closing - {83: {}, 34: {}, 81: {}, 119: {}, 116: {}, 36: {}} [None]
2023-12-20 15:21:31,535 - aiorabbit.client - DEBUG - Transition to 0x103: Closed from 0x102: Closing after 0.0012 seconds
2023-12-20 15:21:31,535 - aiorabbit.client - DEBUG - Resetting internal state
2023-12-20 15:21:31,535 - ansible_rulebook.engine - INFO - Task cancelled Source ansible.eda.rabbitmq task cancelled, initiated shutdown at 2023-12-20 15:21:31.535763
2023-12-20 15:21:31,535 - ansible_rulebook.engine - INFO - Broadcast shutdown to all source plugins
2023-12-20 15:21:31,535 - aiorabbit.client - DEBUG - Disconnected: None
2023-12-20 15:21:31,535 - ansible_rulebook.engine - INFO - Broadcast to queues: [<Queue at 0x1072f6dd0 maxsize=1 _queue=[Shutdown(message='Shutting down', delay=15, kind='graceful', source_plugin='')] _putters[1] tasks=14>, <Queue at 0x1072f6dd0 maxsize=1 _queue=[Shutdown(message='Shutting down', delay=15, kind='graceful', source_plugin='')] _putters[1] tasks=14>]
2023-12-20 15:21:31,535 - ansible_rulebook.engine - INFO - Broadcasting shutdown: Shutdown(message='Source ansible.eda.rabbitmq task cancelled, initiated shutdown at 2023-12-20 15:21:31.535763', delay=300.0, kind='graceful', source_plugin='ansible.eda.rabbitmq')
2023-12-20 15:21:31,535 - ansible_rulebook.app - INFO - Main complete

@sc68cal
Copy link
Contributor Author

sc68cal commented Dec 20, 2023

Here's a log of the webhook rule matching to run the shutdown action, and then another event being received from RabbitMQ that matches another rule in the ruleset and starts a playbook run after the shutdown action has been called.

2023-12-20 15:21:15,981 - ansible_rulebook.action.runner - DEBUG - Cancel Queue reading task
2023-12-20 15:21:15,981 - ansible_rulebook.action.runner - INFO - Ansible runner Queue task cancelled
2023-12-20 15:21:15,985 - ansible_rulebook.action.run_playbook - INFO - Ansible runner rc: 0, status: successful
2023-12-20 15:21:16,113 - ansible_rulebook.rule_set_runner - INFO - Task action::run_playbook::Rulebook for webhook::tsf network create rabbitmq event finished, active actions 1
2023-12-20 15:21:16,115 - ansible_rulebook.rule_set_runner - INFO - Task Task-779 finished, active actions 0
2023-12-20 15:21:16,115 - ansible_rulebook.rule_set_runner - DEBUG - Creating action task action::shutdown::Rulebook for webhook::shutdown via webhook
2023-12-20 15:21:16,116 - ansible_rulebook.rule_set_runner - INFO - call_action shutdown
2023-12-20 15:21:16,116 - ansible_rulebook.rule_set_runner - INFO - substitute_variables [{'delay': 15, 'message': 'Shutting down'}] [{'event': {'payload': {'action': 'shutdown'}, 'meta': {'endpoint': 'endpoint', 'headers': {'Host': '127.0.0.1:5000', 'User-Agent': 'curl/7.87.0', 'Accept': '*/*', 'Content-Type': 'application/json', 'Content-Length': '23'}, 'source': {'name': 'ansible.eda.webhook', 'type': 'ansible.eda.webhook'}, 'received_at': '2023-12-20T20:20:13.686771Z', 'uuid': '81ac1776-176c-42aa-8309-e1963197ca6a'}}}]
2023-12-20 15:21:16,116 - ansible_rulebook.rule_set_runner - INFO - action args: {'delay': 15, 'message': 'Shutting down'}
Ruleset: Rulebook for webhook rule: shutdown via webhook has initiated shutdown of type: graceful. Delay: 15.000 seconds, Message: Shutting down
2023-12-20 15:21:16,117 - ansible_rulebook.engine - INFO - Broadcast to queues: [<Queue at 0x1072f6dd0 maxsize=1 _getters[1] tasks=12>, <Queue at 0x1072f6dd0 maxsize=1 _getters[1] tasks=12>]
2023-12-20 15:21:16,117 - ansible_rulebook.engine - INFO - Broadcasting shutdown: Shutdown(message='Shutting down', delay=15, kind='graceful', source_plugin='')
2023-12-20 15:21:16,117 - ansible_rulebook.rule_set_runner - DEBUG - Ruleset: Rulebook for webhook, received event: Shutdown(message='Shutting down', delay=15, kind='graceful', source_plugin='')
2023-12-20 15:21:16,117 - ansible_rulebook.rule_set_runner - INFO - Ruleset: Rulebook for webhook, received shutdown: Shutdown(message='Shutting down', delay=15, kind='graceful', source_plugin='')
2023-12-20 15:21:16,117 - ansible_rulebook.rule_set_runner - INFO - ruleset: Rulebook for webhook waiting 15.000000 for shutdown
2023-12-20 15:21:16,117 - ansible_rulebook.rule_set_runner - INFO - Task action::shutdown::Rulebook for webhook::shutdown via webhook finished, active actions 0
2023-12-20 15:21:16,117 - ansible_rulebook.rule_set_runner - DEBUG - Creating action task action::print_event::Rulebook for webhook::tsf network create rabbitmq event
2023-12-20 15:21:16,117 - ansible_rulebook.rule_set_runner - INFO - call_action print_event
2023-12-20 15:21:16,122 - ansible_rulebook.rule_set_runner - INFO - Task action::print_event::Rulebook for webhook::tsf network create rabbitmq event finished, active actions 1
2023-12-20 15:21:16,122 - ansible_rulebook.rule_set_runner - DEBUG - Creating action task action::run_playbook::Rulebook for webhook::tsf network create rabbitmq event
2023-12-20 15:21:16,122 - ansible_rulebook.rule_set_runner - INFO - call_action run_playbook

@sc68cal
Copy link
Contributor Author

sc68cal commented Dec 20, 2023

Rulebook:

---
- name: Rulebook for webhook

  hosts: all

  sources:
    - ansible.eda.rabbitmq:
        host: localhost
        port: 5672
        queue: openstack-tsf-network-controller
        user: guest
        password: guest
        routing_key: 'platform.openstack.*.tsf.rendering.network.*'
        exchange: "platform_requests"

    - ansible.eda.webhook:
        port: 5000

  rules:
    - name: "tsf network create rabbitmq event"
      condition: event.body.event == 'createOpenStackNetwork'
      actions:
        - print_event:
            pretty: true
        - run_playbook:
            name: controller/create_tsf_tenant_network.yml
            verbosity: 3

    - name: "shutdown via webhook"
      condition: event.payload.action == "shutdown"
      action:
        shutdown:
          delay: 15
          message: "Shutting down"

sc68cal added a commit to sc68cal/ansible-rulebook that referenced this issue Dec 28, 2023
This is an attempt to address ansible#633 but it's very ham-fisted.
sc68cal added a commit to sc68cal/ansible-rulebook that referenced this issue Dec 28, 2023
This is an attempt to address ansible#633 but it's very ham-fisted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant