SNOW-1151484 Use insertRows for schema evolution #866

sfc-gh-wtrefon · 2024-06-13T15:02:01Z

Overview

SNOW-1161484

Mostly copy of old PR #796

Create channel with SKIP_BATCH instead of CONTINUE, to avoid the case that KC can crash right after adding the good rows to table, and the bad rows will be missing in the DLQ
Update the schematization code to use insertRows instead of insertRow

Better performance
If KC is configured with a longer buffer flush time, everything will be ingested as one batch so that the 1 second flush or 32MB channel size won't apply

Pre-review checklist

sfc-gh-wtrefon · 2024-06-14T13:58:54Z

...ain/java/com/snowflake/kafka/connector/internal/streaming/BufferedTopicPartitionChannel.java

+      if (response.hasErrors()) {
+        handleInsertRowsFailures(
+            response.getInsertErrors(), streamingBufferToInsert.getSinkRecords());
+        insertRecords(


do we need to reset the offset if we put here the schema evolution records as well?
@sfc-gh-tzhang

I don't think so, why we need to do that?

we are setting this flag needToResetOffset to be true in case of schema evolution error. I'm not sure what it exactly does and if we need this if we now reinsert the rows by rebuild the buffer with rebuildBufferWithoutErrorRows

if you see how needToResetOffset will be used, when it's true, we will reopen the channel and restart from the latest committed offset token

sfc-gh-wtrefon · 2024-06-14T14:01:28Z

...ain/java/com/snowflake/kafka/connector/internal/streaming/BufferedTopicPartitionChannel.java

+            needToResetOffset = true;
+          } else {
+            // Simply added to the final response if it's not schema related errors
+            finalResponse.addError(insertError);


this filters out schema evolution errors from the list of errors. This is the only difference compared with your old PR @sfc-gh-tzhang

In my old PR, I don't have finalResponse anymore. I simply return response if there is no schema evolution related errors which should include all the errors already. Do you see any issue with that?

without the finalResponse, the response will contain the schema evolution errors as well. This means in handleInsertRowsFailures we will put the schema evolutions erros in DLQ, this is a behaviour change.

Additionally, the rebuildBufferWithoutErrorRows will filter out the schema evolution errors and such rows wont be inserted ever, right? We don't wont such behaviour

This means in handleInsertRowsFailures we will put the schema evolutions erros in DLQ, this is a behaviour change.

In my old PR, we will return the response before calling handleInsertRowsFailures, right?

Additionally, the rebuildBufferWithoutErrorRows will filter out the schema evolution errors and such rows wont be inserted ever, right? We don't wont such behaviour

Reset the offset token will take care of that, Kafka will send us the same batch again.

sfc-gh-tzhang · 2024-06-16T05:27:47Z

...ain/java/com/snowflake/kafka/connector/internal/streaming/BufferedTopicPartitionChannel.java

+      if (response.hasErrors()) {
+        handleInsertRowsFailures(
+            response.getInsertErrors(), streamingBufferToInsert.getSinkRecords());
+        insertRecords(


I don't think so, why we need to do that?

sfc-gh-tzhang · 2024-06-16T05:28:13Z

...ain/java/com/snowflake/kafka/connector/internal/streaming/BufferedTopicPartitionChannel.java

@@ -540,6 +546,22 @@ public InsertRowsResponse insertRecords(StreamingBuffer streamingBufferToInsert)
    return response;
  }

+  /** Building a new buffer which contains only the good rows from the original buffer */
+  private StreamingBuffer rebuildBufferWithoutErrorRows(


let's add a unit test for this function

sfc-gh-tzhang · 2024-06-16T05:37:35Z

...ain/java/com/snowflake/kafka/connector/internal/streaming/BufferedTopicPartitionChannel.java

+            needToResetOffset = true;
+          } else {
+            // Simply added to the final response if it's not schema related errors
+            finalResponse.addError(insertError);


In my old PR, I don't have finalResponse anymore. I simply return response if there is no schema evolution related errors which should include all the errors already. Do you see any issue with that?

sfc-gh-tzhang · 2024-06-16T05:40:17Z

src/test/java/com/snowflake/kafka/connector/internal/streaming/TopicPartitionChannelTest.java

-        .equals("true")) {
+            .get(SnowflakeSinkConnectorConfig.ENABLE_SCHEMATIZATION_CONFIG)
+            .equals("true")
+        && !useDoubleBuffer) {


I'm not sure what needs to be updated without double buffer now

sfc-gh-wtrefon · 2024-06-18T10:52:49Z

@sfc-gh-tzhang another question, do we need this change? With removal of the extra buffer we will get rid of this code anyway, do we want to invest time here?

sfc-gh-akowalczyk · 2024-06-20T09:48:30Z

@sfc-gh-tzhang another question, do we need this change? With removal of the extra buffer we will get rid of this code anyway, do we want to invest time here?

@sfc-gh-tzhang friendly reminder about the question above ☝️

sfc-gh-tzhang · 2024-06-20T22:08:19Z

@sfc-gh-tzhang another question, do we need this change? With removal of the extra buffer we will get rid of this code anyway, do we want to invest time here?

How would you do schema evolution with the removal of the extra buffer? I don't see a design doc so I don't know the answer.

sfc-gh-wtrefon added 2 commits June 13, 2024 17:00

Use insertRows for schema evolution

eb7ab4c

tests

48e0641

sfc-gh-wtrefon mentioned this pull request Jun 14, 2024

SNOW-1161484: Use insertRows instead of insertRow for schematization #796

Closed

12 tasks

sfc-gh-wtrefon commented Jun 14, 2024

View reviewed changes

sfc-gh-tzhang reviewed Jun 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1151484 Use insertRows for schema evolution #866

SNOW-1151484 Use insertRows for schema evolution #866

sfc-gh-wtrefon commented Jun 13, 2024 •

edited

Loading

sfc-gh-wtrefon Jun 14, 2024

sfc-gh-tzhang Jun 16, 2024

sfc-gh-wtrefon Jun 18, 2024

sfc-gh-tzhang Jun 20, 2024

sfc-gh-wtrefon Jun 14, 2024

sfc-gh-tzhang Jun 16, 2024

sfc-gh-wtrefon Jun 18, 2024

sfc-gh-tzhang Jun 20, 2024

sfc-gh-tzhang Jun 16, 2024

sfc-gh-tzhang Jun 16, 2024

sfc-gh-tzhang Jun 16, 2024

sfc-gh-tzhang Jun 16, 2024

sfc-gh-wtrefon commented Jun 18, 2024

sfc-gh-akowalczyk commented Jun 20, 2024

sfc-gh-tzhang commented Jun 20, 2024

SNOW-1151484 Use insertRows for schema evolution #866

Are you sure you want to change the base?

SNOW-1151484 Use insertRows for schema evolution #866

Conversation

sfc-gh-wtrefon commented Jun 13, 2024 • edited Loading

Overview

Pre-review checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-wtrefon commented Jun 18, 2024

sfc-gh-akowalczyk commented Jun 20, 2024

sfc-gh-tzhang commented Jun 20, 2024

sfc-gh-wtrefon commented Jun 13, 2024 •

edited

Loading