使用dbt时如何手动创建表或使用postgres分区?
我想使用 dbt 将数据插入到分区表中,但发现 postgres 不支持 dbt 分区。
通过另一种方式,我在 pre_hook 中创建表和分区,但在 dbt 运行时收到错误“关系 'download_counts_p' 已存在”
有什么建议吗?这是我的 SQL 和 pre_hook 配置
{{ config(
materialized = 'table',
indexes = [ ],
pre_hook=[
'CREATE TABLE IF NOT EXISTS "download_counts_p" (
"channel_id" int8 NOT NULL,
"product_id" int8 NOT NULL,
"country_code" text NOT NULL,
"year" int2 NULL,
"month" int2 NOT NULL,
"count" int8 NOT NULL,
"count" int8 NOT NULL,
"months" int8 NOT NULL
) partition by list(country_code)',
"DO $$
Declare unique_country_code varchar;
BEGIN
FOR unique_country_code IN
SELECT country_code as unique_country_code FROM download_counts group by country_code
LOOP
EXECUTE format('create table IF NOT EXISTS download_counts_p_%s partition of download_counts_p for values in (''%s'')', upper(unique_country_code), unique_country_code);
END LOOP;
END; $$;"]
)}}
select 1
I want to insert data into a partitioned table using dbt, but found no support for dbt postgres partition.
By another way, i create table and partition in pre_hook, but got the error "relation 'download_counts_p' already exists" when dbt run
Are there any suggestions ? Here is my SQL and pre_hook config
{{ config(
materialized = 'table',
indexes = [ ],
pre_hook=[
'CREATE TABLE IF NOT EXISTS "download_counts_p" (
"channel_id" int8 NOT NULL,
"product_id" int8 NOT NULL,
"country_code" text NOT NULL,
"year" int2 NULL,
"month" int2 NOT NULL,
"count" int8 NOT NULL,
"count" int8 NOT NULL,
"months" int8 NOT NULL
) partition by list(country_code)',
"DO $
Declare unique_country_code varchar;
BEGIN
FOR unique_country_code IN
SELECT country_code as unique_country_code FROM download_counts group by country_code
LOOP
EXECUTE format('create table IF NOT EXISTS download_counts_p_%s partition of download_counts_p for values in (''%s'')', upper(unique_country_code), unique_country_code);
END LOOP;
END; $;"]
)}}
select 1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这里发生了一些不同的事情。
FOR ... LOOP ... END LOOP
挂钩实际上应该编写为它的自己的模型,使用 jinja 循环。您可以使用run_query
将数据返回到 jinja 上下文 宏。该页面底部有一个示例,它将查询结果提取到 jinja 上下文中,然后使用{% for payment_method in results_list %}
循环它们There are a few different things going on here.
FOR ... LOOP ... END LOOP
hook should really be written as its own model, using a jinja loop. You can return data to the jinja context using therun_query
macro. There is an example toward the bottom of that page that fetches the query results into the jinja context and then loops over them with{% for payment_method in results_list %}
在 DBT DDL 中,表的创建是自动完成的,运行 pre-hook 或 post-hook 不会有任何帮助,因为 PostgreSQL 不允许更改表以通过子句进行分区。
而不是 pre_hook 块 - 这也需要为每个模块完成 -。可以修改 DBT postgres 中的创建表语句来处理表分区创建。该解决方案支持范围和列表类型。
作为 PostgreSQL DDL 语句的示例,
将此 DDL 修改为具有 PARTITION BY 子句后,可以创建并附加分区
应用适配
1- 在项目中找到 postgres dbt 适配器,通常路径如下:
2- 修改 < 的内容code>postgres__create_table_as 宏如下:
3- 在要创建的模块中选择分区列表,如下所示:
参考和更多详细信息 https://medium.com/@fmohammad_91999/postgresql-table-partitioning- in-dbt-6d6ed82e90ca
In DBT DDL table creation is automatically done and running pre-hook or post-hook will not be help since PostgreSQL do not allow alter the table to have partition by clause.
Instead of pre_hook blocks -which also need to be done for every module-. The create table statement from DBT postgres can be modified to handle table partition creation. Both range and list types are supported by this solution.
as an example of PostgreSQL DDL statement
after this DDL is modified to have the PARTITION BY clause then partitions can be created and attached
Apply the adaptation
1- find the postgres dbt adapter in your project, normally path like:
2- modify the content of
postgres__create_table_as
macro to be as below:3- Choose the partitioning list in the module to be created like below for example:
Reference and More details https://medium.com/@fmohammad_91999/postgresql-table-partitioning-in-dbt-6d6ed82e90ca