Let redundancy groups not just fail #10190

nilmerg · 2024-10-16T11:38:36Z

This is not a bug, not yet, at least. Once #10014 is fixed, this should be considered.

tl/dr

Redundancy groups should not just fail, they should in addition be not reachable or not determinable, just like hosts and services are.

What's this about?

graph LR;
    ChildHost-->pa["ParentHostA (Group 1)"];
    ChildHost-->pb["ParentHostB (Group 1)"];
    pa-->ga["GrandParentA"];
    pb-->gb["GrandParentB"];

Right now, the child host will be unreachable in two cases:

Both parents are down (expected)
One of them is unreachable (The linked bug)

Once the second case is fixed, it will likely behave like this instead:

Both parents are down (still, expected)
Both parents are unreachable (new, expected)

But what happens in case only one of the parents is down and the other unreachable?

One parent is down and the other unreachable (new, ??)

I suppose so, since already right now, the availability of the child host is influenced by its parent's reachable state. It will just be extended to consider redundancy groups in such a case.

But then there is `Dependency::IsAvailable`

Which is what is used to determine a redundancy group's state at the moment. Though, it doesn't consider the parent's reachable state at all. The parent might be UP, but unreachable, and it returns true.

So, in order to really ensure that the child host is unreachable in the third case, this might need to change. (The potential bug)

Of course, this has larger side-effects, as it changes how dependencies work in general.

Which is what this issue is really about! As I believe this is a good thing. Why shouldn't a dependency fail, in case another one higher up in the hierarchy fails? This is already the case, I think, anyway. This discrepancy will only pop up in case checks are disabled on the parent in question. (As if checks run, the parent will eventually go down/unknown)

Additional note

What led me to this, was the wish to understand how a redundancy group's state will be determined as part of Icinga/icingadb#347.

As of today I thought it simply represents whether all related dependencies failed or not. But now a failed dependency might be failing, because a parent might just not be reachable. And so will a redundancy group.

This all sounds fine, unless you want to identify the actual root problem in a failing dependency chain. (A root problem is a dependency node, which is reachable and has a problem)

In a dependency chain, the current plan is, to represent redundancy groups as actual nodes with an actual state. While hosts and services also have state, they can in addition be not reachable. Redundancy groups can only fail.

But a redundancy group, with one host member that is UP but not reachable, is technically also not reachable. So it's not just a failing dependency, but one that is … not determinable?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let redundancy groups not just fail #10190

Let redundancy groups not just fail #10190

nilmerg commented Oct 16, 2024

Let redundancy groups not just fail #10190

Let redundancy groups not just fail #10190

Comments

nilmerg commented Oct 16, 2024

tl/dr

What's this about?

But then there is Dependency::IsAvailable

Additional note

But then there is `Dependency::IsAvailable`