GH-41110: [C#] Handle empty stream in ArrowStreamReaderImplementation #43939

voidstar69 · 2024-09-03T21:23:45Z

Rationale for this change

Implementing this under the assumption that fixing #41110 is a good idea. Please do let me know if there are any contrary opinions on this subject.

What changes are included in this PR?

Handle empty stream in ArrowStreamReaderImplementation. I have not made similar changes to ArrowMemoryReaderImplementation or ArrowFileReaderImplementation.

Are these changes tested?

I have created two basic unit tests covering this new behaviour. This might not be sufficient to cover all cases where an empty stream should be handled without an exception occurring.

GitHub Issue: [C#] ArrowStreamReader does not handle empty stream well #41110

github-actions · 2024-09-03T21:24:11Z

⚠️ GitHub issue #41110 has been automatically assigned in GitHub to PR creator.

voidstar69 · 2024-09-03T21:25:25Z

csharp/src/Apache.Arrow/Ipc/ArrowStreamReaderImplementation.cs

@@ -55,6 +55,9 @@ public override RecordBatch ReadNextRecordBatch()

        protected async ValueTask<RecordBatch> ReadRecordBatchAsync(CancellationToken cancellationToken = default)
        {
+            if (BaseStream.Length == 0)
+                return null;


Is there any downside to checking the length of the stream? This might require all data to be read from e.g. a network socket, when all we really want to know here is whether there will be any data at all. Is there a better way to determine this?

I don't think it's even possible to check the length of a network stream.

Yes, the tests show the same.
I could instead try to e.g. read a single byte from the stream, to see if it produces any data. But then the processing of the schema would need that byte, and I cannot "push" that byte back onto the stream, unless I wrap it into some other type of stream.
Maybe this feature is not worth the cost?

I think this would have to work by remembering whether or not we're at the beginning of a possible message and then swallowing the end-of-stream exception if and only if that was the position.

This code now avoids throwing an exception if zero bytes are read when reading the schema, instead it simply returns without reading the message.
Is this sufficient? Will a schema always be read before each message is read? Does something similar need to be implemented for reading from files and from memory buffers?

… can be read

voidstar69 added 2 commits September 3, 2024 22:11

Handle empty stream in ArrowStreamReader

9c09dad

Delete commented out code. Minor code simplification.

947d227

voidstar69 requested a review from CurtHagenlocher as a code owner September 3, 2024 21:23

github-actions bot added Component: C# awaiting review Awaiting review labels Sep 3, 2024

voidstar69 commented Sep 3, 2024

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 3, 2024

voidstar69 marked this pull request as draft September 3, 2024 21:37

Switch implementation to avoid throwing at start of stream if no data…

d5215ec

… can be read

voidstar69 marked this pull request as ready for review September 19, 2024 21:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-41110: [C#] Handle empty stream in ArrowStreamReaderImplementation #43939

GH-41110: [C#] Handle empty stream in ArrowStreamReaderImplementation #43939

voidstar69 commented Sep 3, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Sep 3, 2024

voidstar69 Sep 3, 2024

CurtHagenlocher Sep 3, 2024

voidstar69 Sep 4, 2024

CurtHagenlocher Sep 4, 2024

voidstar69 Sep 19, 2024

GH-41110: [C#] Handle empty stream in ArrowStreamReaderImplementation #43939

Are you sure you want to change the base?

GH-41110: [C#] Handle empty stream in ArrowStreamReaderImplementation #43939

Conversation

voidstar69 commented Sep 3, 2024 • edited by github-actions bot Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

github-actions bot commented Sep 3, 2024

voidstar69 Sep 3, 2024

Choose a reason for hiding this comment

CurtHagenlocher Sep 3, 2024

Choose a reason for hiding this comment

voidstar69 Sep 4, 2024

Choose a reason for hiding this comment

CurtHagenlocher Sep 4, 2024

Choose a reason for hiding this comment

voidstar69 Sep 19, 2024

Choose a reason for hiding this comment

voidstar69 commented Sep 3, 2024 •

edited by github-actions bot

Loading