-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Don't close stream passed to PdfWriter.write() #2909
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2909 +/- ##
=======================================
Coverage 96.44% 96.44%
=======================================
Files 52 52
Lines 8728 8730 +2
Branches 1589 1589
=======================================
+ Hits 8418 8420 +2
Misses 182 182
Partials 128 128 ☔ View full report in Codecov by Sentry. |
@@ -249,7 +249,6 @@ def _get_clone_from( | |||
# to prevent overwriting | |||
self.temp_fileobj = fileobj | |||
self.fileobj = "" | |||
self.with_as_usage = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not found of removing with_as_usage attribute. it may be usefull to know that the the object has been created for a context manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's never used. Keeping it around would mislead someone reading the code that it matters in some way. It's dead code but easy to revive if a need arises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that having a dead property/attribute is not good, but it is a public interface which tends to need a proper deprecation process.
for i in range(4): | ||
writer.add_page(reader.pages[i]) | ||
writer.write(tmp) | ||
assert not tmp.file.closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be great to also have a test for where the automatic write at the closure of the context will be done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I managed to add a test for this. It was a bit confusing because of the double-construct that happens in __enter__()
. I spent a fair amount of time trying to understand the clone_from
logic before I realized that everything from the first __init__()
is thrown away except for temp_fileobj
.
pypdf/_writer.py
Outdated
@@ -369,6 +367,9 @@ def __exit__( | |||
"""Write data to the fileobj.""" | |||
if self.fileobj: | |||
self.write(self.fileobj) | |||
close_attr = getattr(self.fileobj, "close", None) | |||
if callable(close_attr): | |||
self.fileobj.close() # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking @MasterOdin's post
the stream closure should be done by the caller, not here, no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it should be closed on exit to avoid leaking resources - unless I misunderstood the existing discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that if a stream is opened by the PdfWriter(eg. if a path is provided)the stream should be closed but if it is the stream (opened before the context of the PdfWriter) it should be closed out of the 'with' section. As written it is always closed at the closure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking this code isn't needed at all. The only way PdfWriter
stores a closable object in self.fileobj
is via a with-statement invoking __init__()
passing in the object. I note that sometimes self.fileobj
is readable and used for cloning and sometimes it's writable and used for output, which is confusing. Also, I think there's a bug in this line: if isinstance(fileobj, (IO, BytesIO)):
in that IO
is from the typing module and no real objects will inherit from it. I suspect it was meant to be more like IOBase
, but I'm not sure. Finally, the fileobj.seek(-1, 2)
confuses me; why go one byte back from the end? If the file's empty it'll raise OSError.
In any case, the only file I see being created is in PdfWriter.write()
where it's a local variable and it gets closed.
I've created PR #2913 that also deals with Context manager. Consistency should be checked |
@stefan6419846 let me know how you'd like to proceed |
t = fileobj.tell() | ||
fileobj.seek(-1, 2) | ||
fileobj.seek(0, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this change?
While I doubt that many users really rely on it, the deprecation process mentioned in https://github.com/py-pdf/pypdf/pull/2909/files#r1808253593 is still necessary as we are replacing a public attribute. |
Closes #2905