-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not interacting with folder mounted on dm-writeboost? Version 2.2.10 #206
Comments
I am afaid after you fsck the filesystem and change the state, we can't track the root cause any more but I wish you used this tool to check whether the dm-writeboost is broken or not. https://github.com/akiradeveloper/dm-writeboost-tools And of course you should have checked the dmesg if there is some message from dm-writeboost. At least, the configuration must be shared. About flawlessness of the software, in my opinion, dm-writeboost will not be broken in a usual operation because there are users that operate more harder more longer days. So the typical situation, dm-writeboost may go insane is that you shut down the system in unusual manner. I design the dm-writeboost so it survives even in this situation but I can't prove that the isn't anything wrong in code. When dm-writeboost is broken, it is typically seen that the checksum is broken. The tool If you ever deconstructed the dm-writeboost device after your system went wrong and the device was rebuilt and then sawthe dmesg, there were chances we could check if the checksums are not broken because the kernel code check all the segments. This code means, dm-writeboost discards segments after failure (e.g. partial writes) /*
* Compare the checksum
* if they don't match we discard the subsequent logs.
*/
actual = calc_checksum(rambuf, header->length);
expected = le32_to_cpu(header->checksum);
if (actual != expected) {
DMWARN("Checksum incorrect id:%llu checksum: %u != %u",
(long long unsigned int) le64_to_cpu(header->id),
actual, expected);
break;
} However, since you are telling that most of the data are gone, I don't believe only the some final writes have failed but the entire data is broken. I don't know it is due to dm-writeboost or ext4 or your operation. |
Please check the hardware (memory, disk), if I remember correctly, there was a case that the cause is the broken memory. It can be possible that your system is running half correctly while writing something crazy data to the disk and then suddenly go crazy after few days. In this case, both storage softwares are to be blamed. |
I think it's my fault when i try this: |
Such operations in the userland will not cause data corruption in kernel level. The error message tells us that the segment has a checksum of 0 on the data but the recomputed value is 143703573. This means the segment was written badly due to some serious system failure (cpu/memory/disk or power failure) You should use wbcheck tool to check all the segments on the cache device and count how many of them are corrupted. Especially, it is important to know if the neighbours (id 1463972723 and 1463972725) are ok. |
Thank you for response i will close this topic for now |
Hi akira after 12 days, this happen again. lsblk still have mountpoint: I have use: wbcheck /dev/mapper/ssddisk-ssd 447583694 wbcheck /dev/mapper/ssddisk-ssd 3381612806 wbcheck /dev/mapper/ssddisk-ssd 3381612806 wbmeta /dev/mapper/ssddisk-ssd 0 |
Please give segment ID as the parameter, not the inode number.
This means you haven't formatted the caching device. |
I use 'ls' it show nothing but i still can access directory and file inside it. Where can i find these segment ID? i can only see block and inode. of cache blocks = 697748922of segments = 5494086current id = 4369548 of dirty cache blocks = 0of partial flushes = 674322write? hit? on_buffer? fullsize? |
You have to understand how dm-writeboost segments the caching device before using it. My guess is that, as I mentioned earlier, your system is broken. Don't know the disk, memory or cpu. Because dm-writeboost formats the caching device when it is not formatted (so erasing all the cache blocks is done by zeroing the first sector of the caching device). Then, that the superblock is unformatted is just an error. |
Really? I didn't tell you should zero the first sector but okay if it works. Actually, the most of the dirty blocks have been done write-back
then it would probably be ok to discard the caches.
Additionally you had better run memory check and badblock if you can. It is very very weird to see that the super block is broken. It is so unusual. |
wbcheck /dev/mapper/ssddisk-ssd 4369547 also return nothing so i assume it had been write successfully also. |
If you can. Yes. I don't precisely remember what issue number it was but I remember a user report was finally resolved after the memory was found broken. |
Dear akira,
After 13 days running dm-writeboost on my disk something wrong happen,
i cannot interact with the folder /data/ anymore using ls, mkdir, add file....
dmesg log fault:
[3341551.692063] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm ls: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341552.701259] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm ls: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341553.339575] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm ls: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341554.576510] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm ls: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341653.508665] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm bash: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341653.731893] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm bash: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341663.437200] EXT4-fs error (device dm-5): ext4_find_dest_de:1653: inode #2: block 14654: comm mkdir: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341665.889757] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm ls: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341666.823193] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm ls: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341671.900039] EXT4-fs error (device dm-5): ext4_find_dest_de:1653: inode #2: block 14654: comm mkdir: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341742.492629] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm ls: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341953.555294] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm find: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341977.426021] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm find: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341981.320515] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm find: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3341983.834547] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm find: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342035.975328] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm lsof: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342038.377535] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm lsof: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342045.639684] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm lsof: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342046.804830] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm lsof: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342047.532793] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm lsof: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342194.645129] EXT4-fs (dm-5): warning: mounting fs with errors, running e2fsck is recommended
[3342195.055847] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null)
[3342199.564925] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm ls: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342200.426116] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm ls: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342203.563006] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm ls: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342221.592676] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm find: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342226.812250] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm find: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342229.270561] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm find: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342242.942348] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm find: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
[3342254.850072] EXT4-fs error (device dm-5): htree_dirblock_to_tree:914: inode #2: block 14654: comm find: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=677963375, rec_len=28910, name_len=4
but dm -writeboost block still running with lsblock:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 2.6T 0 disk
└─sda1 8:1 0 2.6T 0 part
└─ssddisk-ssd 253:4 0 2.6T 0 lvm
└─wbdev 253:5 0 98.2T 0 dm /data
sdb 8:16 0 98.2T 0 disk
└─sdb1 8:17 0 98.2T 0 part
└─hdddisk-hdd 253:3 0 98.2T 0 lvm
└─wbdev 253:5 0 98.2T 0 dm /data
sdd 8:48 0 447.1G 0 disk
├─sdd1 8:49 0 200M 0 part /boot/efi
├─sdd2 8:50 0 1G 0 part /boot
└─sdd3 8:51 0 445.9G 0 part
├─centos-root 253:0 0 217.9G 0 lvm /
├─centos-swap 253:1 0 128G 0 lvm [SWAP]
└─centos-home 253:2 0 100G 0 lvm /home
I'm afraid that when writeback threshold is meet something glitch or bug between my software and dm-writeboost. Is that possible?
I have try umount and mount that data folder again but nothing changed, still can not interact with the folder.
Then i use fsck trying to fix file system, lost+found folder created but all the other folder and data had been removed.
I know it's not your responsibility to this fault but can you tell me how can i debug it to the root cause?
Thank you akira
The text was updated successfully, but these errors were encountered: