Safe logic for concurrent executions #1132
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
resolves #
docs
Problem
when running an incremental table with the on_schema_change policy set to append_new_columns in a dbt project. If two jobs concurrently perform the column check operation, they both generate the same ALTER TABLE statement. Because of this simultaneous execution, a race condition occurs where one job's ALTER TABLE statement succeeds, while the slower executor encounters a SQL compilation error stating that the column already exists. This issue stems from the inability to ensure that the column schema remains unchanged between the column check operation and the execution of the ALTER TABLE statement, leading to potential failures in concurrent environments.
Solution
The solution to this problem is to incorporate the IF NOT EXISTS and IF EXISTS conditions in the ALTER TABLE statements. By using these conditions, the ALTER TABLE statement will only attempt to add a column if it does not already exist, and drop a column only if it exists.
However, this introduces some drawbacks. These include increased complexity in SQL logic, potential masking of underlying schema synchronization issues, minor performance impacts, and the risk of partial schema updates. Additionally, this solution is specific to Snowflake or databases that support these conditions, making it less portable to other database systems.
Checklist