读取 BQUERY STREAMING(实时)API

发布于 2025-01-11 02:18:16 字数 779 浏览 0 评论 0原文

我有 BigQuery 数据仓库,它从 Google Analytics 获取数据。 数据是实时传输的。 现在我想使用 BigQuery 的 API 在数据到达(而不是之后)时获取这些数据。

我已经看到了 api,它允许您在将数据保存到 bigquery 后查询数据, 例如:

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print("name={}, count={}".format(row[0], row["total_people"]))

有没有办法“监听”数据并将其中一些存储在云端? 而不是让它被保存然后从bigquery查询?

谢谢

I have BigQuery data warehouse which gets its data from Google Analytics.
the data is streamd - real time.
now I want to get this data as it arrives (and not after) to the bigquery using its API.

I have seen the api which lets you query the data after it saved into the bigquery,
for example:

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print("name={}, count={}".format(row[0], row["total_people"]))

Is there any way to "listen" to the data and store some of it on cloud?
rather than let it be saved and then query from the bigquery?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

滴情不沾 2025-01-18 02:18:16

目前没有用于访问 BigQuery 中托管数据的流式读取机制;现有机制在给定时间点利用某种形式的类似快照的一致性(tabledata.list、存储 API 读取等)。

鉴于您的数据已经自动传递到 BigQuery 中,下一个最好的事情可能是某种增量策略,您可以使用某种过滤器定期读取数据(按时间戳过滤的最新数据等)。

There is not currently a streaming read mechanism for accessing managed data in BigQuery; existing mechanisms leverage some form of snapshot-like consistency at a given point in time (tabledata.list, storage API read, etc).

Given that your data is already automatically delivered into BigQuery, the next best thing is likely some kind of delta strategy where you read periodically with some kind of filter (recent data filtered by a timestamp, etc).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文