Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(channel_reconfig): don't reconfigure the non-IO nexus channel #37

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

dsharma-dc
Copy link
Contributor

The sync qpair connect can race with a async connect during an unplug dynamic reconfiguration event. This can happen on the nexus channel that gets created as part of rebuild on spdk thread init_thread. Since this nexus channel is not used for normal IO, we can ignore this from reconfiguration. The rebuild IOs can still be served on this and the locked byte ranges will still be safe via other nexus channels, but any other child device add/remove will not disturb this channel via dynamic reconfigure path.

Signed-off-by: Diwakar Sharma <diwakar.sharma@datacore.com>
@dsharma-dc
Copy link
Contributor Author

With this change, I ran the testcase written under openebs/mayastor#1506 and it passes OK.

@@ -170,7 +170,10 @@ extern "C" fn inner_traverse_channel<ChannelData, Ctx>(
let ctx = TraverseCtx::<ChannelData, Ctx>::from_iter(i);
let mut chan = IoChannel::<ChannelData>::from_iter(i);

let rc = (ctx.channel_cb)(chan.channel_data_mut(), &mut ctx.ctx);
let mut rc = ChannelTraverseStatus::Ok;
if chan.thread_name() != "init_thread" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might work for now but seems quite brittle.. we probably better do this properly by using channel specific data. Example: Adding bool io_channel to the NexusChannel perhaps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought that, but perhaps that'd require some hacking because NexusChannel won't be directly accessible here in ffi spdk_rs. Here we only have a generic channel type view.

Copy link
Contributor Author

@dsharma-dc dsharma-dc Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I can let this closure call happen irrespective of spdk thread and read the new boolean in the nexus code just before callling chan.reconnect_all(). However, to set that bool to true or false during channel construction we'll again have to rely on spdk thread name I guess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or better, can get the id of current thread by spdk_thread_get_id() using spdk_rs::Thread abstraction of spdk_thread, and if id is 1 it's init_thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring the decision to the "rust native" code seems good.

@dsharma-dc dsharma-dc marked this pull request as draft October 13, 2023 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants