Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set unique temp table suffix to allow parallel incremental executions #811

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

huangxingyi-git
Copy link

@huangxingyi-git huangxingyi-git commented Sep 26, 2024

Resolves #

Description

Set unique table suffix to allow parallel incremental execution
For some specific cases (eg. backfill very large amount of data), we need to execute parallel multiple dbt run of specific incremental(replace_where) model in which we pass the date (or country) as var argument.
For example, we have a model we run every day using Airflow for which we pass the a date relative to the Airflow scheduler.
FYI
https://github.com/dbt-labs/dbt-athena/pull/650/files

If we want to process by batch of N days in parallel using Airflow concurrency, we need the tmp table create by each of the dbt run to be unique. Else, you are going to end up with N insert attempting to run with the same __dbt_tmp name, creating conflict and ultimately creating failure.

issue

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

Signed-off-by: huang xingyi <hxy911122@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant