Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

omit_data_loss config variable #4001

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

richtja
Copy link
Contributor

@richtja richtja commented Sep 13, 2024

In tp_libvrt testing, We've seen many occurrences where "Bad file descriptor" OSError was raised during clean up phase when an external command is executed via avocado.utils.process utility. It happens when SubProcess class wants to flush the stdout and stderr of the external command after finishes. These kinds of errors might lead to stdout and stderr data loss, but during the tp_libvrt testing it leads to false positive failures, which makes test evaluation harder.

This commit introduces a new config variable omit_data_loss which, when is enabled, will omit these errors, and they won't affect the overall test result.

Reference: avocado-framework/avocado#6019

@richtja
Copy link
Contributor Author

richtja commented Sep 13, 2024

Hi @smitterl, can you please test this in your workflow and let me know if this solution works for you? Thanks.

@cliping
Copy link
Contributor

cliping commented Sep 24, 2024

Hi @richtja, I used your PR in migration job. In the test results, there are still the following errors:

Screenshot 2024-09-24 at 09 43 11

Could you help to check this issue? Thank you.

@richtja
Copy link
Contributor Author

richtja commented Sep 24, 2024

Hi @cliping, IMO this should be ok. The idea is to keep the info about the error in the logs, but just omit it from the test result. So if the overall result is PASS than it is working.

Of course, we can discuss how to remove them from logs as well.

@cliping
Copy link
Contributor

cliping commented Sep 24, 2024

Hi @richtja, the final test result is FAIL. I don't know why this error is not omitted.

Screenshot 2024-09-24 at 19 03 36

@richtja
Copy link
Contributor Author

richtja commented Sep 24, 2024

@cliping Ok, so that is something which needs to be fixed. Thanks for letting me know, I'm going to look into that.

@Yingshun
Copy link
Contributor

@richtja any updates? There are a lot of failures(~20%) in recent runs due this OSError, it will be appreciated if you can fix the problem, thx!

@smitterl
Copy link
Contributor

Agree with the others that the test must pass and if configured we don't want to see warnings or errors in the logs. As called out by @Yingshun we have a high rate of failures due to this.

This also makes me assume we will configure this and likely never find the time to look further into the data loss.

The instances that I've seen in our test runs don't deliver a compelling argument that these errors can't be skipped. But I guess that's avocado-vt specific and avocado will cover other situations.

IIUC, you guys are worried about the data loss. Is there a way to avoid this data loss and write the data out into a file in the test log dir?

@richtja
Copy link
Contributor Author

richtja commented Oct 16, 2024

Hi @smitterl, @Yingshun. I don't have any progress on this right now, but I will look into this in this week. So IIUIC your CI is scanyng for ERRORs and WARNINGs to find failures. How this scan work? Would be be possible to just change the message type from ERROR to INFO like this:

from 
2024-09-12 11:28:45,948 stacktrace       L0045 ERROR| [Errno 9] Bad file descriptor
to
2024-09-12 11:28:45,948 stacktrace       L0045 INFO| [Errno 9] Bad file descriptor

or do we need to remove the message completly from debug.log file?

"avocado.utils.process utility during clean up. This can be used "
"when this error would cause false positive failures of a test."
)
add_option(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if introduce new option in avocado run (here --vt-omit-data-loss), it will bring additional efforts to update our CI to enable that. I wonder whether we can introduce one gating_variable e.g omit_data_loss in avocado_vt/conf.d/vt.conf, if omit_data_loss is set true, it enable it implicitly .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, you can enable it with config option as well by setting vt.omit_data_loss to true. see https://github.com/avocado-framework/avocado-vt/pull/4001/files#diff-64c70a40c0700575186086930678d178f26a5b3fbe71dfe9d1ebd20d4d5e7752R135

But I am not sure if we should enable this implicitly, because it can hide real failures by skipping error messages. AFAIK this issue is related only to your CI and I think we shouldn't affect others by this.

@@ -150,6 +150,11 @@ def setUp(self):
except exceptions.TestSkipError:
self.__exc_info = sys.exc_info()
raise # This one has to be raised in setUp
except OSError as e:
if not self._config.get("vt.omit_data_loss"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we need this solution urgently, if in short term, we can not have complete solution. Maybe we can just output the error message, but never raise exception in any situation. But surely we can open one issue to track limitation, and further work out complete solution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just sent an update via force-push which keeps the messages, but it should stop raising the exceptions.

In tp_libvrt testing, We've seen many occurrences where "Bad file
descriptor" OSError was raised during clean up phase when an external
command is executed via `avocado.utils.process` utility. It happens when
`SubProcess` class wants to flush the stdout and stderr of the external
command after finishes. These kinds of errors might lead to stdout and
stderr data loss, but during the tp_libvrt testing it leads to false
positive failures, which makes test evaluation harder.

This commit introduces a new config variable `omit_data_loss` which,
when is enabled, will omit these errors, and they won't affect the
overall test result.

Reference: avocado-framework/avocado#6019
Signed-off-by: Jan Richter <jarichte@redhat.com>
@richtja
Copy link
Contributor Author

richtja commented Oct 17, 2024

Hi @cliping, @Yingshun, @smitterl, @chunfuwen I have just sent new version via force-push. It should avoid exception raises, and the test result should be correct now. Can anyone test it please. Thank you.

About removing the messages from logs. Unfortunately, it looks like that it will be harder than I thought, because the logging happens at different places than the exception raise.

Copy link
Contributor

@chunfuwen chunfuwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

liping chen help validated that false warning failure has been fixed although still there is noisy information "Errno 22] Invalid argument".
I approved it for urgent fix

@Yingshun
Copy link
Contributor

Thank you all for your efforts on this issue! As chunfu mentioned, the results are good for now and we are waiting for further testing, so merge it.

@Yingshun Yingshun merged commit 3a65355 into avocado-framework:master Oct 18, 2024
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done 109
Development

Successfully merging this pull request may close these issues.

5 participants