-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safeguard against reuse of active caching device #115
Comments
@onlyjob Sharing a caching device with two dm-writeboost'd devices is invalid. I once considered this by embedding some unique identifier of the backing device into the super block of the caching device but concluded there is no perfect solution for this. The first reason is such perfect identifier doesn't exist (neither device name nor uuid are valid). And the second reason is we should let the userland tools (e.g. your writeboost) to manage invalid operations. There are lot of invalid operations that is theoretically possible but managing them isn't a role of kernel module but the userland tools. However, looping forever and you resort to rebooting bit sounds to be fixed. So I want to ask you for more details. I couldn't see what you actually did
|
I wrote a test to reproduce the error message you saw in dmesg class REPRO_115 extends DMTestSuite {
test("endless loop when try to use active caching device") {
Memory(Sector.M(8)) { caching => // 1
Memory(Sector.M(128)) { backing1 => // 2
Writeboost.sweepCaches(caching) // 3
Writeboost.Table(backing1, caching).create { s => // 4
Shell(s"dd if=/dev/urandom of=${s.bdev.path} bs=1M count=128") // write // 5
}
} // 6 (when moving out of block, device is removed)
Memory(Sector.M(64)) { backing2 => // 7
Writeboost.Table(backing2, caching).create { s => // 8
Shell(s"dd if=${s.bdev.path} of=/dev/null bs=1M count=128") // read // 9
}
}
}
}
}
the test code says:
the number is corresponding to the comments in the test code so you can understand the code. (not difficult isn't it?) the dmesg comes from writeback thread because the dirty cache's destination address is far beyond 64MB (it's around 120MB) |
I believe it would be irrelevant. Too many disks, etc.
"Active" means in use. I created one dm-writeboosted device (HDD1+SSD1)
Probably same size. IMHO irrelevant as you can't reliably use size to detect whether device is used.
Actively used dm-writeboost device was not stopped so yes, second device was created before stopping first one. I'm not concerned about embedding unique ID to caching devices. Here what's important is to check whether caching device is already allocated to another device before constructing new dm-writeboosted device. Only run-time check to prevent deadlock and potential data loss. Thanks. |
I wrote a test for the case but there is no error. test("making another device while in use") {
Memory(Sector.M(8)) { caching =>
Memory(Sector.M(64)) { backing1 =>
Memory(Sector.M(64)) { backing2 =>
Writeboost.sweepCaches(caching)
Writeboost.Table(backing1, caching).create { s =>
Writeboost.Table(backing2, caching).create { s =>
}
}
}
}
}
}
dm_dev has reference count internally but target code can't access it. I don't like to have my own data structure to manage that. |
Well, I don't understand how your test work but I wrecked my system once when I improperly constructed writeboosted device by accidentally reusing caching disk of another writeboosted device that was already in use. It was too easy to make such mistake. Safeguard is important to implement for safety. |
If we maintain a list of pair (backing, caching) where both values are simply the name passed that's only effective between two reboots (or hot-swapping is another issue this can't avoid the problem described here) I will look deeper the dm code if we can use the internally managed reference count. |
Let's try adding FMODE_EXCL flag and see what changes. There is no target adding this flag though. static int consume_essential_argv(struct wb_device *wb, struct dm_arg_set *as)
{
int err = 0;
struct dm_target *ti = wb->ti;
err = dm_get_device(ti, dm_shift_arg(as), dm_table_get_mode(ti->table),
&wb->backing_dev);
if (err) {
DMERR("Failed to get backing_dev");
return err;
}
err = dm_get_device(ti, dm_shift_arg(as), dm_table_get_mode(ti->table),
&wb->cache_dev); |
By default the static inline fmode_t get_mode(struct dm_ioctl *param)
{
fmode_t mode = FMODE_READ | FMODE_WRITE;
if (param->flags & DM_READONLY_FLAG)
mode = FMODE_READ;
return mode;
} it's either |
The idea of adding FMODE_EXCL alone doesn't work |
I've managed to wreck dm-writeboost module by accidentally re-using active caching device by constructing another dm-writeboosted device using the same caching device.
dm-writeboost went into endless loop constantly reading from caching device:
I did not find the way how to interrupt it so I had to resort to emergency reboot.
Please consider introducing safeguard to prevent accidental re-use of caching device.
The text was updated successfully, but these errors were encountered: