GitHub - greenplum-db/dbt-greenplum: An adaptation of the PostgreSQL DBT adapter for Tanzu Greenplum

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

dbt is the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.

dbt-greenplum

The dbt-greenplum package contains the code enabling dbt to work with Greenplum. For more information on using dbt with Greenplum, consult the docs.

This adapter is based on dbt-postgres v1.8.2.

Compared to dbt-postgres, dbt-greenplum supports some Greenplum-Specific features.

For those Greenplum-Specific features, we reference markporoshin/dbt-greenplum which is no longer maintained.

Greenplum-Specific Features

Distribution Policy

Greenplum will choose the first column as the distribution column by default while not specifying distribution policy.

You can specify distribution policy explicitly with the following settings.

distributed randomly by setting distributed_randomly=true in the model config.
distributed replicated by setting distributed_replicated=true in the model config.
distributed by (<column_name> [, ... ]) by setting distributed_by='<column_name> [, ... ]' in the model config.

Table Storage Model

Heap Storage(Default)

You can create heap table with the setting appendoptimized=false in the model config.

Here is an example.
```
{{
  config(
    ...
    materialized='table',
    appendoptimized=false,
    ...
  )
}}

select 1 as "id"
```
Append-Optimized Storage

You can create Append-Optimized(AO) table with the setting appendoptimized=true in the model config.
Row or Column-Oriented Storage
- Row-Oriented Storage
  
  Row-oriented storage is used by default.
  
  Also you can create table with row-oriented storage by setting orientation=row in the model config explicitly.
- Column-Oriented Storage (Append-Optimized Tables Only)
  
  You can create table with column-oriented storage by setting orientation=column in the model config.
  
  NOTE: Tables that use column-oriented storage must be append-optimized tables.
Using Compression (Append-Optimized Tables Only)

You can specify compresstype, level and blocksize settings in the model config.

Default values are compresstype=ZLIB, compresslevel=1 and blocksize=32768.

NOTE: Only append-optimized tables can use compression.

Partition Tables

Greenplum can NOT create partition tables with create table as statement.

So to build model with partitions, you need to two steps.

create table
insert data

To build model with partitions, we need to configure

fields_string - definition of columns name, type and constraints

This setting is must.

one of following way to configure partitions

Way 1: using the setting raw_partition

Here is an example.

{% set fields_string %}
   id int4 null,
   incomingdate timestamp NULL
{% endset %}

{% set raw_partition %}
  PARTITION BY RANGE (incomingdate)
  (
    START ('2021-01-01'::timestamp) INCLUSIVE
    END ('2023-01-01'::timestamp) EXCLUSIVE
    EVERY (INTERVAL '1 day'),
    DEFAULT PARTITION extra
  )
{% endset %}

{{
  config(
    materialized='table',
    fields_string=fields_string,
    raw_partition=raw_partition,
  )
}}

with source_data as (

    select
        1 as id,
        '2022-02-22'::timestamp as incomingdate
    union all
    select
        null as id,
        '2022-02-25'::timestamp as incomingdate
)
select *
from source_data

The generated SQL by dbt is as follows. (We mock the model name using <model_identifier>.)

create table if not exists <model_identifier> (
   id int4 null,
   incomingdate timestamp NULL
)
PARTITION BY RANGE (incomingdate)
(
   START ('2021-01-01'::timestamp) INCLUSIVE
    END ('2023-01-01'::timestamp) EXCLUSIVE
    EVERY (INTERVAL '1 day'),
    DEFAULT PARTITION extra
);

insert into <model_identifier> (
with source_data as (
    select
        1 as id,
        '2022-02-22'::timestamp as incomingdate
    union all
    select
        null as id,
        '2022-02-25'::timestamp as incomingdate
)
select *
from source_data
);

Way 2: using the settings partition_type, partition_column, partition_spec

Here is an example.

{% set fields_string %}
    id int4 null,
    incomingdate timestamp NULL
{% endset %}

{%- set partition_type = 'RANGE' -%}
{%- set partition_column = 'incomingdate' -%}
{% set partition_spec %}
    START ('2021-01-01'::timestamp) INCLUSIVE
    END ('2023-01-01'::timestamp) EXCLUSIVE
    EVERY (INTERVAL '1 day'),
    DEFAULT PARTITION extra
{% endset %}

{{
    config(
        materialized='table',
        fields_string=fields_string,
        partition_type=partition_type,
        partition_column=partition_column,
        partition_spec=partition_spec,
    )
}}

with source_data as (

    select
        1 as id,
        '2022-02-22'::timestamp as incomingdate
    union all
    select
        null as id,
        '2022-02-25'::timestamp as incomingdate
)
select *
from source_data

Generated SQL by dbt is same as Way 1.

Way 3: using the settings partition_type, partition_column, partition_start, partition_end, partition_every

Here is an example.

{% set fields_string %}
    id int4 null,
    incomingdate timestamp NULL
{% endset %}

{%- set partition_type = 'RANGE' -%} 
{%- set partition_column = 'incomingdate' -%}
{%- set partition_start = "'2021-01-01'::timestamp" -%}
{%- set partition_end = "'2023-01-01'::timestamp" -%}
{%- set partition_every = '1 day' -%}

{{
    config(
        materialized='table',
        fields_string=fields_string,
        partition_type=partition_type,
        partition_column=partition_column,
        partition_start=partition_start,
        partition_end=partition_end,
        partition_every=partition_every,
        default_partition_name='extra'
     )
}}

with source_data as (

    select
        1 as id,
        '2022-02-22'::timestamp as incomingdate
    union all
    select
        null as id,
        '2022-02-25'::timestamp as incomingdate
)
select *
from source_data

Generated SQL by dbt is same as Way 1.

Way 4: using the settings partition_type, partition_column, partition_values

Here is an example.

{% set fields_string %}
    id int4 null,
    incomingdate timestamp NULL
{% endset %}

{%- set partition_type = 'LIST' -%}
{%- set partition_column = 'code' -%}
{% set partition_values %}
    PARTITION sales VALUES ('S'),
    PARTITION returns VALUES ('R')
{% endset %}

{{
    config(
        materialized='table',
        fields_string=fields_string,
        partition_type=partition_type,
        partition_column=partition_column,
        partition_values=partition_values,
        default_partition_name='extra'
    )
}}

with source_data as (

    select
        1 as id,
        'S' as code
    union all
    select
        null as id,
        'N' as code
)
select *
from source_data

Generated SQL by dbt is as follows.

create table if not exists <model_identifier> (
    id int4 null,
    code char(1)
)
PARTITION BY LIST (code)
    (
        PARTITION sales VALUES ('S'),
        PARTITION returns VALUES ('R'),
        DEFAULT PARTITION extra
    )
;

insert into <model_identifier> (
with source_data as (
    select
        1 as id,
        'S' as code
    union all
    select
        null as id,
        'N' as code
)
select *
from source_data
);

default_partition_name - name of default partition, default value is other

Incremental Strategy

Valid incremental strategies are append, delete+insert and truncate+insert.

Getting started

Install dbt
Read the introduction and viewpoint

Join the dbt Community

Be part of the conversation in the dbt Community Slack
Read more on the dbt Community Discourse

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
dbt		dbt
scripts		scripts
tests		tests
.changie.yaml		.changie.yaml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
test.env.example		test.env.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbt-greenplum

Greenplum-Specific Features

Distribution Policy

Table Storage Model

Partition Tables

Incremental Strategy

Getting started

Join the dbt Community

About

Releases

Packages

Contributors 8

Languages

License

greenplum-db/dbt-greenplum

Folders and files

Latest commit

History

Repository files navigation

dbt-greenplum

Greenplum-Specific Features

Distribution Policy

Table Storage Model

Partition Tables

Incremental Strategy

Getting started

Join the dbt Community

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages