如何在DBT中使用CETA(Synapse无服务器池)?
在 Synapse Serverless Pool 中,我可以使用 CETAS 创建外部表并将结果导出到 Azure Data Lake Storage。
CREATE EXTERNAL TABLE external_table
WITH (
LOCATION = 'location/',
DATA_SOURCE = staging_zone,
FILE_FORMAT = SynapseParquetFormat
)
AS
SELECT * FROM table
它将在 Synapse 中创建一个名为 external_table 的外部表,并将 parquet 文件写入 Azure Data Lake 中的暂存区域。
我怎样才能在 dbt 中做到这一点?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我试图做一些非常类似的事情并使用 Synapse Serverless Pool 运行我的 dbt 项目,但遇到了几个问题。最终我被CETAS误导了。创建外部表时,它会创建一个文件夹层次结构,并在其中放置 parquet 文件。如果您要运行与示例中的脚本相同的脚本,则会失败,因为您无法使用 CETAS 进行覆盖。因此 dbt 能够像任何其他模型一样运行它,但它不容易被覆盖。也许如果您在每次运行脚本时动态制作一个新的镶木地板并删除旧的镶木地板,但这似乎就像在出血伤口上贴上一块小绷带,即突触和无连接池交互。由于这个原因,我不得不改变我的架构。
我试图导出为镶木地板以维护列数据类型和描述,因此我不必重新绘制图表。另外,我还可以根据管道中的增量点创建表。我最终找到了一种方法,使用 dbt-突触适配器。然后,如果我需要一个增量表,我可以通过 dbt 和 dbt-synapse 将其具体化为表并以这种方式访问它。
您导出的 parquet 文件的目标是什么?
也许我们可以找到另一种解决方案?
这里是 dbt-synapse-serverless 适配器 github,其中列出了无服务器池的注意事项。
I was trying to do something very similar and run my dbt project with Synapse Serverless Pool, but ran into several issues. Ultimately I was mislead by CETAS. When you create the external table it creates a folder hierarchy, in which it places the parquet file. If you were to run the same script like the one you have as an example it fails because you cannot overwrite with CETAS. So dbt would be able to run it like any other model, but it wouldn't be easy to overwrite. Maybe if you dynamically made a new parquet every time the script is run and deleted the old one, but that seems like putting a small bandage on the hemorrhaging wound that is the synapse and severless pool interaction. I had to switch up my architecture for this reason.
I was trying to export as a parquet to maintain the column datatypes and descriptions so I didn't have to re-schematize. Also so I could create tables based of incremental points in my pipeline. I ended up finding a way to pull from a database that already had the datatype schemas, using the dbt-synapse adapter. Then if I needed an incremental table, I could materialize it as a table via dbt and dbt-synapse and access it that way.
What is your goal with the exported parquet file?
Maybe we can find another solution?
Here's the dbt-synapse-serverless adapter github where it lists caveats for serverless pools.
我在这里为CETAS(Synapse无服务器池)写了一份物质化:
这是从dbt-synapse-serverless的分叉: https://github.com/dbt-msft/dbt-synapse-serverless
您也可以使用DBT中的钩子使用CETAS。
I wrote a materialization for CETAS (Synapse Serverless Pool) here: https://github.com/intheroom/dbt-synapse-serverless
It's a forked from dbt-synapse-serverless here: https://github.com/dbt-msft/dbt-synapse-serverless
Also you can use hooks in dbt to use CETAS.