-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1151484 Use insertRows for schema evolution #866
base: master
Are you sure you want to change the base?
Conversation
if (response.hasErrors()) { | ||
handleInsertRowsFailures( | ||
response.getInsertErrors(), streamingBufferToInsert.getSinkRecords()); | ||
insertRecords( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to reset the offset if we put here the schema evolution records as well?
@sfc-gh-tzhang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, why we need to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are setting this flag needToResetOffset to be true in case of schema evolution error. I'm not sure what it exactly does and if we need this if we now reinsert the rows by rebuild the buffer with rebuildBufferWithoutErrorRows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you see how needToResetOffset will be used, when it's true, we will reopen the channel and restart from the latest committed offset token
needToResetOffset = true; | ||
} else { | ||
// Simply added to the final response if it's not schema related errors | ||
finalResponse.addError(insertError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this filters out schema evolution errors from the list of errors. This is the only difference compared with your old PR @sfc-gh-tzhang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my old PR, I don't have finalResponse
anymore. I simply return response
if there is no schema evolution related errors which should include all the errors already. Do you see any issue with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without the finalResponse
, the response
will contain the schema evolution errors as well. This means in handleInsertRowsFailures we will put the schema evolutions erros in DLQ, this is a behaviour change.
Additionally, the rebuildBufferWithoutErrorRows will filter out the schema evolution errors and such rows wont be inserted ever, right? We don't wont such behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means in handleInsertRowsFailures we will put the schema evolutions erros in DLQ, this is a behaviour change.
In my old PR, we will return the response before calling handleInsertRowsFailures, right?
Additionally, the rebuildBufferWithoutErrorRows will filter out the schema evolution errors and such rows wont be inserted ever, right? We don't wont such behaviour
Reset the offset token will take care of that, Kafka will send us the same batch again.
if (response.hasErrors()) { | ||
handleInsertRowsFailures( | ||
response.getInsertErrors(), streamingBufferToInsert.getSinkRecords()); | ||
insertRecords( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, why we need to do that?
@@ -540,6 +546,22 @@ public InsertRowsResponse insertRecords(StreamingBuffer streamingBufferToInsert) | |||
return response; | |||
} | |||
|
|||
/** Building a new buffer which contains only the good rows from the original buffer */ | |||
private StreamingBuffer rebuildBufferWithoutErrorRows( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a unit test for this function
needToResetOffset = true; | ||
} else { | ||
// Simply added to the final response if it's not schema related errors | ||
finalResponse.addError(insertError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my old PR, I don't have finalResponse
anymore. I simply return response
if there is no schema evolution related errors which should include all the errors already. Do you see any issue with that?
.equals("true")) { | ||
.get(SnowflakeSinkConnectorConfig.ENABLE_SCHEMATIZATION_CONFIG) | ||
.equals("true") | ||
&& !useDoubleBuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what needs to be updated without double buffer now
@sfc-gh-tzhang another question, do we need this change? With removal of the extra buffer we will get rid of this code anyway, do we want to invest time here? |
@sfc-gh-tzhang friendly reminder about the question above ☝️ |
How would you do schema evolution with the removal of the extra buffer? I don't see a design doc so I don't know the answer. |
Overview
SNOW-1161484
Mostly copy of old PR #796
Create channel with SKIP_BATCH instead of CONTINUE, to avoid the case that KC can crash right after adding the good rows to table, and the bad rows will be missing in the DLQ
Update the schematization code to use insertRows instead of insertRow
Pre-review checklist
snowflake.ingestion.method
.Yes
- Added end to end and Unit Tests.No
- Suggest why it is not param protected