-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RollingFileAppender rotates files incorrectly on ext4 due to filesystem timestamp caching #3068
Comments
Hi @kelunik, Thank you for the very detailed analysis. Unfortunately We probably would need to modify the
This is IMHO a bug, which is due to the usage of |
This is another problem by itself, but the fact that a rollover happens, is itself a bug. Using the last-modified time would likely work (and is what If there's just a single log event logged at This condition is, however, a lot less likely than the current edge case. My original proposed fix was rounding up to the nearest second, but then I learned that rotation based on milliseconds is supported, where this would break. |
The last-modified time can be set on virtually any OS and filesystem, so we can override the value assigned by the OS. There are some Windows file systems with a terrible timestamp resolution (see Windows-compatible filesystems' file time resolutions), but hopefully these are at least capable of precisely representing full minutes, hours and days. Ideally we could set the last-modified time to the theoretical time of the rollover, which should be close to the time the file was actually modified. |
We need to set this on the current log file, not on rollover on the rolled away file, so setting this on every write doesn't seem like a performant approach? Any write to the file will make the OS change the value again. |
Right, I was looking at this wrong. If the last modification time of the file is taken into account for a rotation, then we probably don't need to do anything, because the natural last modification time will be well after midnight. Only the creation time can be before midnight. |
@rgoers Do you remember why you changed from modified time to creation time in e392c79? Calling My fix proposal would be to revert this back to |
That commit is a fix to LOG4J2-1906 and is probably related to #2297, where a regression occurred. Anyway I believe that those issues are independent of this one:
|
@ppkarwasz Yes, I've of course looked at that ticket, but that didn't make it clear to me why modification time got changed to creation time.
Yes, I know, that's why I want to change the initial time back to the modification time instead of the creation time, but I don't want to cause a regression with that, that's why I want to understand the reasons for the original change there. |
I have been thinking about this some more, and the modified time seems to be broken as well, imagine:
Then the date is also wrong. However, it would be newer than expected and avoid a rotation instead of causing an additional rotation. This still leaves the edge case if events are only logged in the first 10ms of the rotation period, but none after that. |
Description
RollingFile
rotates files incorrectly if an event is logged shortly (in our case up to 10ms) after the rollover time and the application is restarted within the same rollover period on ext4 (Linux).Configuration
Version: 2.24.1
Operating system: Debian 11 / Linux 5.10.0-32-amd64
JDK: 21.0.4 Corretto
Reproduction
We've been into a deep rabbit hole to analyze these conditions, but here's what happens and how to reproduce:
2024-10-07 00:00:000.000000000
(you can also use minute based rotations for easier reproduction)TimeBasedTriggeringPolicy#isTriggeringEvent
will returntrue
, which results in a rotationapp.log
file will rotate toapp-2024-10-07.log
RollingFileManager#createFileAfterRollover
will create a newapp.log
(inFileManager#createOutputStream
)creationTime
set, but that call is ignored onext3
/ext4
filesystems.current_fs_time()
on ext4 –current_fs_time()
returns a cached timestamp that's only updated on kernel clock ticks (seegetconf CLK_TCK
)current_fs_time()
might lag behind up to 10 milliseconds2024-10-06 23:59:59.997423036
(usestat app.log
to confirm)TimeBasedTriggeringPolicy#initialize
readsRollingFileManager#getFileTime
(initalized byRollingFileManager#initialFileTime
), which returns the above borked timestamp.2024-10-07 00:00:000.000000000
, which in turn will trigger a new rollover right on application start.app.log
toapp-2024-10-06.log
, therefore leading to:app-2024-10-06.log
file contents2024-10-07 00:00:00
until the restart being unexpectedly inapp-2024-10-06.log
instead ofapp.log
(if still on the same day) orapp-2024-10-07.log
(later)tail -f app.log
to hang (our original symptom) –tail
will (by default) follow the file descriptor on the rename toapp-2024-10-06.log
and new events will be written toapp.log
instead.The text was updated successfully, but these errors were encountered: