-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sensible log behavior when redis is unavailable #15466
base: devel
Are you sure you want to change the base?
Conversation
started YOLO 10231 |
From running a deployment, it looks like there's another loop not yet solved.
I don't really want this either, so it'll probably take another commit. Let's see... it could be due to starting the dispatcher without redis available. |
Wait, never mind about my last error. That's receptor, not redis. Not in scope here. |
YOLO 10245 |
3af35db
to
d9b6e0f
Compare
Quality Gate passedIssues Measures |
SUMMARY
This picks up on the "redis" part of a prior PR #12698 and goes further.
That prior PR was too unfocused, trying to solve both the redis and receptor problems. Backing up, why am I looking at this problem in the first place? Because it gets in my way (personally) trying to diagnose other problems. Because when I look at logs, those logs are swamped due to 2 main reasons:
This PR is only concerned with the 2nd bullet point.
So when you get the log file you want (finally, after wading through the rest of the SOS report), you find that 95% of that log is stuff you don't want. Even worse, the noise is all "Traceback:" entries... which isn't great when what you're looking for is a stack trace of that format.
With that segway, here's a demo of the log behavior after taking redis down:
This is noisy on a certain point, but that is actually an interesting point.
The dispatcher "statistics" are used for the
--status
command. So if we can't get the statistics to stash into redis... what do we do? Before, we would drop that data on the floor and then log a giant stack trace. But, since redis connection errors are a very well-known quality, better to show the details of the error, and then print the data that we're dropping. Right? There's a non-zero chance that we have a pool management bug while at the same time hitting this, confounding debugging even more.ISSUE TYPE
COMPONENT NAME