DBT可以直接连接到Google表格吗?

发布于 2025-01-21 12:02:31 字数 671 浏览 5 评论 0 原文

目前,我们有一个DBT实例,该实例位于我们的Google BigQuery数据仓库上。现在,我们最近被要求将Google表中的一些数据纳入我们的建模中。

这样,DBT是否有可能直接与Google表面联系? IE将Google表配置为.yml文件中的直接外部数据源,或者DBT可能会运行某种BigQuery Federated SQL语句?

有一个DBT软件包,称为dbt-external-tables(),但这似乎只能与Google Cloud clough Storage Buckets中的BigQuery +文件一起使用。

但是,我在论坛和文档中看到的常见和最直接的选项是在Google Sheep顶部在BigQuery上创建一个外部表。然后将DBT连接到外部BigQuery表。

只是想检查上述集成DBT X Google Sheets X BigQuery的常见选项实际上是唯一的选择,或者实际上是否有一种方法可以让DBT直接连接到Google Sheets,然后才击中BigQuery?

谢谢

we currently have a DBT instance that sits over our Google BigQuery data warehouse. Now we've recently been asked to incorporate some data from Google Sheets into our modelling.

With that, is it possible for DBT to connect directly with Google Sheets? i.e. configure Google Sheets as a direct external datasource in the .yml file, or have DBT possibly run some sort of BigQuery federated SQL statement?

There's a DBT package called dbt-external-tables (https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/), but that only seems to work with BigQuery + files in Google Cloud Storage buckets.

But the common and most straightforward option I'm seeing in forums and documentation is to create an external table on BigQuery on top of the Google Sheet. And then have DBT connect to the external BigQuery table.

Just wanted to check if the above common option for integrating DBT x Google Sheets x BigQuery is in fact the only option, or if there's actually a way to have DBT connect directly to Google Sheets before hitting BigQuery?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

浅唱ヾ落雨殇 2025-01-28 12:02:31

从我在 dbt-external-table上看到的内容 side,bigquery适配器折叠到 ddl语句对于 create_external_table 宏。

不幸的是,我只是看不到Google表“外部”定义的类似DDL语句。如果我不得不猜测,UI似乎可以通过 bq CLI客户端执行某些内容,以通过Web门户创建。

如果将部分添加到该指南包括DDL定义对于基于Google Drive 的外部资源,这可能会成为前面提到的外部桌子的DBT宏的相对容易的构建。在此之前,您必须通过UI, bq 客户端或REST API来定义此问题。

From what I see over on the dbt-external-tables side, the bigquery adapter folds to a DDL statement for the create_external_table macro.

Unfortunately, I just don't see a similar DDL statement available for the Google Sheets "external" definition. It looks like the UI probably executes something through the bq cli client to create through from the web portal, if I had to guess.

If a section is ever added to this guide which includes a DDL definition for Google Drive based external sources, this would probably become a relatively easy build into the previously mentioned dbt macro for external tables. Until then, you will have to define this through the UI, the bq client yourself, or the REST api.

百变从容 2025-01-28 12:02:31

您可以使用 dbt_external_tables 设置Google Sheet作为源,如果使用BigQuery (编辑2023-08-10,基于Databricks用户的反馈)。上周刚刚找到了这个。这是指定模式的示例。如果您遗漏了列,应该可以自动检索,但是在我所做的工作表上对我无效。

2023-04-21更新更好的配置示例和命令。

这是一个显示配置的架构和自动检索的配置,它们对我有用。

version: 2
sources:
  - name: google_sheets
    description: >
      This source contains data stored in Google Sheets that are manually maintained.
    project: my-project
    dataset: stage_google_sheet
    loader: external_table
    tags:
      - source
      - google_sheet
    tables:
      - name: marketing_taxonomy
        description: >
          Marketing source channels, categories, etc.
        external:
          options:
            format: GOOGLE_SHEETS
            uris: ['https://docs.google.com/spreadsheets/d/[SHEET-ID]']
            sheet_range: data
            skip_leading_rows: 1
        columns:
          - name: source_raw
            data_type: STRING
          - name: source
            data_type: STRING
          - name: channel
            data_type: STRING
          - name: source_type
            data_type: STRING
          - name: source_category
            data_type: STRING
          - name: new_channel
            data_type: STRING
      - name: kpi_metrics_targets
        description: >
          KPI's maintained by FP&A
        external:
          options:
            format: GOOGLE_SHEETS
            uris: ['https://docs.google.com/spreadsheets/d/[SHEET-ID]']
            sheet_range: 'Monthly Targets'
            skip_leading_rows: 1

对于新手的 dbt_external_tables 的人,您必须运行以下命令,然后才能引用源:

dbt run-operation stage_external_sources-args -args-args“ select:google_sheets”

使用您的更新源名称。就我而言,我必须添加 -vars“ ext_full_refresh:true” 选项。在所有情况下,这都是不需要的。

编辑:这是在BigQuery中起作用的,因为BigQuery支持Google表的创建外部表语句。请参阅:

You can set up a Google Sheet as a source with dbt_external_tables if you use BigQuery (edit 2023-08-10, based on feedback from a Databricks user). Just found this out last week. Here's an example with schema specified. If you leave out the columns it's supposed to be able to autodetect, but it didn't work for me on the sheet I did.

2023-04-21 updating with better config example and commands.

Here is a config showing both a configured schema and autodetect, both of which worked for me.

version: 2
sources:
  - name: google_sheets
    description: >
      This source contains data stored in Google Sheets that are manually maintained.
    project: my-project
    dataset: stage_google_sheet
    loader: external_table
    tags:
      - source
      - google_sheet
    tables:
      - name: marketing_taxonomy
        description: >
          Marketing source channels, categories, etc.
        external:
          options:
            format: GOOGLE_SHEETS
            uris: ['https://docs.google.com/spreadsheets/d/[SHEET-ID]']
            sheet_range: data
            skip_leading_rows: 1
        columns:
          - name: source_raw
            data_type: STRING
          - name: source
            data_type: STRING
          - name: channel
            data_type: STRING
          - name: source_type
            data_type: STRING
          - name: source_category
            data_type: STRING
          - name: new_channel
            data_type: STRING
      - name: kpi_metrics_targets
        description: >
          KPI's maintained by FP&A
        external:
          options:
            format: GOOGLE_SHEETS
            uris: ['https://docs.google.com/spreadsheets/d/[SHEET-ID]']
            sheet_range: 'Monthly Targets'
            skip_leading_rows: 1

For people new to dbt_external_tables, you have to run the following command before you can reference the source:

dbt run-operation stage_external_sources --args "select: google_sheets"

Update with your source name. In my case I had to add the --vars "ext_full_refresh: true" option as well. That may not be necessary in all cases.

Edit: this works in BigQuery because BigQuery supports Google Sheets for its create external table statement. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_external_table_statement

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文