Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add birthtime, blocksize and blks to stat output for NTFS #848

Merged
merged 2 commits into from
Oct 10, 2024

Conversation

Miauwkeru
Copy link
Contributor

@Miauwkeru Miauwkeru commented Sep 13, 2024

st_block uses the realsize if defined, oterwise it uses the size
the real size on disk is only defined if the file is not a resident file.

st_birthtime and st_birthtime_ns now use the creation_time from a MFT Record

Other timestamps where incorrectly used, and have been changed:
st_mtime: changed from using last_change_time to last_modification_time

@Miauwkeru Miauwkeru linked an issue Sep 13, 2024 that may be closed by this pull request
Copy link

codecov bot commented Sep 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.50%. Comparing base (2389376) to head (fe92d40).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #848      +/-   ##
==========================================
+ Coverage   76.48%   76.50%   +0.02%     
==========================================
  Files         314      314              
  Lines       26933    26945      +12     
==========================================
+ Hits        20600    20615      +15     
+ Misses       6333     6330       -3     
Flag Coverage Δ
unittests 76.50% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Schamper
Copy link
Member

image

@Miauwkeru
Copy link
Contributor Author

Miauwkeru commented Sep 13, 2024

image

"_st_ctime": "integer time of last change",

So what do we want to use, the windows style or unix style? Cause I'd prefer them to be consistent across dissect. Regardless what the python docs say. As we change it anyways

@Schamper
Copy link
Member

The entire filesystem layer is designed to be consistent with Python since the beginning of Dissect.

@Miauwkeru
Copy link
Contributor Author

The entire filesystem layer is designed to be consistent with Python since the beginning of Dissect.

In that case, have we changed that for any other filesystem then? Where the c_time gets used for the birth timestamp on the machine? Or should we adapt the btrfs stuff too when using windows?

Cause in this current case it seems to only hold for ntfs doesn't it?

@Schamper
Copy link
Member

Schamper commented Sep 13, 2024

https://github.com/python/cpython/blob/3.12/Modules/posixmodule.c#L2142-L2145
https://docs.python.org/3/library/os.html#os.DirEntry.stat (deprecation note)

The only thing missing currently is setting st_birthtime and st_birthtime_ns to their proper values on Windows filesystems (read, NTFS).

Btrfs and other filesystems don't have this, as they actually properly expose all 4 timestamps.

@Miauwkeru
Copy link
Contributor Author

https://github.com/python/cpython/blob/3.12/Modules/posixmodule.c#L2142-L2145 https://docs.python.org/3/library/os.html#os.DirEntry.stat (deprecation note)

The only thing missing currently is setting st_birthtime and st_birthtime_ns to their proper values on Windows filesystems (read, NTFS).

Btrfs and other filesystems don't have this, as they actually properly expose all 4 timestamps.

st_birthtime gets set here right?: https://github.com/python/cpython/blob/aee219f4558dda619bd86e4b0e028ce47a5e4b77/Python/fileutils.c#L1122

where ctime gets changed afterwards at the posixmodule sets it to the value in st_birthtime, which is only used for backwards compatibility.

So sure, I'll set the ctime to birthtime for now. And make a note about it for backwards compatibility

@Miauwkeru Miauwkeru force-pushed the 820-add-metadata-to-stat-output-for-ntfs branch from d5a6364 to 8fd94c0 Compare September 20, 2024 09:09
@Schamper
Copy link
Member

@Miauwkeru
Copy link
Contributor Author

st_birthtime gets set here right?: https://github.com/python/cpython/blob/aee219f4558dda619bd86e4b0e028ce47a5e4b77/Python/fileutils.c#L1122

In our code.

which I've done?

@Horofic Horofic self-assigned this Sep 26, 2024
@pyrco
Copy link
Contributor

pyrco commented Sep 27, 2024

Yes, it seems to be correct now.

Both st_birthtime and st_ctime are now set to the CreationTime attribute, as is done by Python up to and including Py3.12.

The change for st_mtime to the LastModificationTime attribute (which is called LastWriteTime in the Windows fileapi.h, and is the name Python uses) is also in line with how python handles this timestamp.

@Miauwkeru Miauwkeru force-pushed the 820-add-metadata-to-stat-output-for-ntfs branch from 8fd94c0 to 88d9121 Compare October 4, 2024 12:07
@Horofic Horofic self-requested a review October 7, 2024 10:28
@Schamper Schamper changed the title Add birthtime, blocksize and blks to stat_output for NTFSFilesystemEntry Add birthtime, blocksize and blks to stat output for NTFS Oct 7, 2024
@Horofic
Copy link
Contributor

Horofic commented Oct 7, 2024

Ack on the timestamps. Is @Poeloe still on for a code-review?


st_info.st_blocks = 0
if not self.is_dir():
st_info.st_blocks = math.ceil(st_info.st_blksize / 512) * math.ceil(st_info.st_size / st_info.st_blksize)
Copy link
Contributor

@pyrco pyrco Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this double ceiling is correct, it could lead to a really inflated number of blocks, e.g.:

st_size: 256
st_blksize: 128
->
math.ceil(128/512) * math.ceil(256/128) = math.ceil(1/4) * math.ceil(2) = 1 * 2 = 2

Inflates the apparent size by a factor of 4

st_size: 1024
st_blksize: 768
->
math.ceil(768/512) * math.ceil(1024/768) = math.ceil(3/2) * math.ceil(4/3) = 2 * 2 = 4

Inflates the apparent size by a factor of 2

If I read and interpret the various sources of what st_blocks should be it seems to me it represents the size on disk in blocks (which are 512 bytes in our implementation).
The POSIX/opengroup for example says: "Number of blocks allocated for this object."

The on disk is important in the sense that if files that are e.g. sparse their size on disk is smaller than their actual size, the same holds for example for files that are compressed on the filesystem layer.

So I think we can just do a math.ceil(st_info.st_size / 512) here. It also means that for symbolic links st_blocks should be 0, unless the symbolic link data is not (only) contained in the inode structure.

@Miauwkeru Miauwkeru requested a review from pyrco October 8, 2024 13:01
st_block uses the realsize if defined, oterwise it uses the size
the real size on disk is only defined if the file is not a resident file.

st_birthtime and st_birthtime_ns now use the creation_time from a MFT Record

Other timestamps where incorrectly used, and have been changed:
st_mtime: changed from using last_change_time to last_modification_time
@Miauwkeru Miauwkeru force-pushed the 820-add-metadata-to-stat-output-for-ntfs branch from 5baac61 to fe92d40 Compare October 10, 2024 12:15
@Miauwkeru Miauwkeru merged commit ddbc901 into main Oct 10, 2024
18 checks passed
@Miauwkeru Miauwkeru deleted the 820-add-metadata-to-stat-output-for-ntfs branch October 10, 2024 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metadata to stat output for NTFS
4 participants