Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not able to dmsetup remove wb device before all write data are successfully flushed #127

Open
akiradeveloper opened this issue Jul 30, 2016 · 5 comments
Assignees

Comments

@akiradeveloper
Copy link
Owner

/*
 * .postsuspend is called before .dtr.
 * We flush out all the transient data and make them persistent.
 */
static void writeboost_postsuspend(struct dm_target *ti)
{
    struct wb_device *wb = ti->private;
    flush_current_buffer(wb);
    blkdev_issue_flush(wb->cache_dev->bdev, GFP_NOIO, NULL);
}

When wb'd device is terminated, postsuspend hook is called. The hook calls flush_current_buffer. Why I am doing this is just because other dm targets also make transient data persistent in this hook.

But this is harmful if the caching device is broken and wb can't flush rambuf anymore. In that case, the caller of dmsetup remove blocks forever.

I am considering if I have a chance to remove postsuspend and provide drop_transient message interface so user can drop the transient data before terminating the wb'd device. (theoretically it's ok and it was the primary implementation of this hook)

But of course, the change isn't backward-compatible. For example, Dmitry's writeboost needs change because dmsetup suspend then dmsetup resume will not guarantee anything about persistency.

@akiradeveloper akiradeveloper self-assigned this Jul 30, 2016
@akiradeveloper
Copy link
Owner Author

If we move the might_queue_current_buffer right after mutex_lock (I think it's logically ok but shortcoming of the change is overwriting to the last block in rambuf becomes impossible because wb flushes the rambuf once it reaches the last block of the rambuf regardless of it's dirtiness), I think the implementation becomes easy.

static int do_process_write(struct wb_device *wb, struct bio *bio)
{
    int retval = 0;

    struct metablock *write_pos = NULL;
    struct lookup_result res;

    struct write_io wio;
    wio.data = mempool_alloc(wb->buf_8_pool, GFP_NOIO);
    if (!wio.data)
        return -ENOMEM;
    initialize_write_io(&wio, bio);

    mutex_lock(&wb->io_lock);

    cache_lookup(wb, bio, &res);

    if (res.found) {
        if (unlikely(res.on_buffer)) {
            write_pos = res.found_mb;
            goto do_write;
        } else {
            retval = prepare_overwrite(wb, res.found_seg, res.found_mb, &wio, wio.data_bits);
            dec_inflight_ios(wb, res.found_seg);
            if (retval)
                goto out;
        }
    } else
        might_cancel_read_cache_cell(wb, bio);

    might_queue_current_buffer(wb);

But not sure if I can time out so easily.

@akiradeveloper
Copy link
Owner Author

If we can remove wb'd device irrelevant of flushing the dirty data, we can emulate power failure that is discussed in #68.

@akiradeveloper
Copy link
Owner Author

suspend should complete all requests submitted before dmsetup suspend.
so we need to flush current rambuf because deferred flush request may be chained. (if no flush request is chained there is no need to flush the current buffer but it's too unlikely - who kill the device without flush?)

@akiradeveloper
Copy link
Owner Author

So I think the current postsuspend implemention is fine.

@akiradeveloper
Copy link
Owner Author

An other idea is ack the deferred flush requests with error (EIO or DM_ENDIO_REQUEUE).

But still the inflight ios may needs a rambuf to be flushed to acquire new rambuf which obviously isn't accomplishable if SSD can't succeed on write.

Then error all requests? (how disastrous)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant