You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Network operators who are deploying Kytos-ng in production and using of_core need to be able to identify (and hook it on external healthcheck mechanisms) when OpenFlow connections aren't getting stable either because of packets/handshake or a generalized crashes. Our python runtime shouldn't not struggle handling connections as long as it's a reasonable value, if it is, then of_core should expose that this is happening (maybe through and endpoint) just so this can be used externally to spun up and switchover to a different kytosd instance, this can help for recoverable errors.
Other than that, outside of code related implementation, network operators should also have alerts for how many errors or tracebacks have happened overtime, we can have this readily available on ES with Kibana, although alerts are premium ES feature, but the data is there, so a script could also poll or query that:
viniarck
changed the title
feat: Identify when connections are being closed or crashing constantly
feat: identify when connections are being closed or crashing constantly
Feb 15, 2023
viniarck
changed the title
feat: identify when connections are being closed or crashing constantly
feat: identify and expose when connections are being closed or crashing constantly
Feb 15, 2023
I agree, @viniarck. This feature can be part of a watchdog Napp or something like this, which consolidates all validations (not only of_core) and translates into an operational status (which could indicate success, failure, or partial failure - includingg failure in non-critical components, so on)
Problem:
Network operators who are deploying Kytos-ng in production and using
of_core
need to be able to identify (and hook it on external healthcheck mechanisms) when OpenFlow connections aren't getting stable either because of packets/handshake or a generalized crashes. Our python runtime shouldn't not struggle handling connections as long as it's a reasonable value, if it is, thenof_core
should expose that this is happening (maybe through and endpoint) just so this can be used externally to spun up and switchover to a differentkytosd
instance, this can help for recoverable errors.Other than that, outside of code related implementation, network operators should also have alerts for how many errors or tracebacks have happened overtime, we can have this readily available on ES with Kibana, although alerts are premium ES feature, but the data is there, so a script could also poll or query that:
cc'ing @italovalcy for his info
This issue still needs further discution, but overall that's the problem we need to solve.
The text was updated successfully, but these errors were encountered: