在大查询中以动态的方式将json分为列

发布于 2025-02-06 19:12:32 字数 732 浏览 4 评论 0原文

我有以下JSON:

{
    "rewards": {
        "reward_1": {
            "type": "type 1",
            "amount": "amount 1"
        },
        "reward_2": {
            "type": "type 2",
            "amount": "amount 2"
        },
        "reward_3": {
            "type": "type 3",
            "amount": "amount 3"
        },
        "reward_4": {
            "type": "type 4",
            "amount": "amount 4"
        }
    }
}

这个JSON是动态的,我不一定知道它会得到多少奖励,这里是4个,但可以是2或8等。

我想在大查询中编写一个查询,以解析这些值动态地不知道其中有多少,然后将它们分成列,例如:

”在此处输入图像描述” 谢谢你!

I have the following JSON:

{
    "rewards": {
        "reward_1": {
            "type": "type 1",
            "amount": "amount 1"
        },
        "reward_2": {
            "type": "type 2",
            "amount": "amount 2"
        },
        "reward_3": {
            "type": "type 3",
            "amount": "amount 3"
        },
        "reward_4": {
            "type": "type 4",
            "amount": "amount 4"
        }
    }
}

This JSON is dynamic and I don't necessarily know how many rewards it will get, here it's 4 but it can be 2 or 8 etc.

I want to write a query in Big Query that will parse those values dynamically without knowing how many of them exist, and then split them into column, like this:

enter image description here
Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

巴黎夜雨 2025-02-13 19:12:32

希望这些有帮助。

  1. 由于JSON数据是动态的,因此第一步是找到最大奖励序列。 (我使用了正则表达式和max_reward udf。)
  2. 然后,以迭代方式从json 奖励字段中提取每个奖励。
  3. 最后,使用pivot查询使结果成为宽形式。

如果需要一个更通用的解决方案,则需要使用BigQuery Dynamic SQL生成枢轴列。我已经在查询中对它们进行了硬编码。

('ready_1','reward_2','reward_3','reward_4')

查询:

CREATE TEMP TABLE sample AS 
SELECT 1 AS id, '{"rewards": { "reward_1": { ... ' AS json -- put your json here
 UNION ALL
SELECT 2 AS id, '{"rewards": { "reward_1": { ... ' AS json -- put your another json here
;

CREATE TEMP FUNCTION extract_reward(json STRING, seq INT64)
RETURNS STRUCT<type STRING, amount STRING>
LANGUAGE js AS """
  return JSON.parse(json)['reward_' + seq];
""";

CREATE TEMP FUNCTION max_reward(arr ARRAY<STRING>) AS ((
  SELECT MAX(CAST(v AS INT64)) FROM UNNEST(arr) v
));

SELECT * FROM (
  SELECT id,
         'reward_' || seq AS reward, 
         extract_reward(FORMAT('%t', JSON_QUERY(json, '$.rewards')), seq) AS value
    FROM sample, UNNEST(GENERATE_ARRAY(1, max_reward(REGEXP_EXTRACT_ALL(json, r'"reward_([0-9]+)"')))) seq
)  PIVOT (ANY_VALUE(value) FOR reward IN ('reward_1', 'reward_2', 'reward_3', 'reward_4'));

output:

”单独的列

SELECT * FROM (
  SELECT id,
         'reward_' || seq || '_' || IF (offset = 0, 'type', 'amount') AS reward,
         value
    FROM sample,
         UNNEST(GENERATE_ARRAY(1, max_reward(REGEXP_EXTRACT_ALL(json, r'"reward_([0-9]+)"')))) seq, 
         UNNEST([extract_reward(FORMAT('%t', JSON_QUERY(json, '$.rewards')), seq)]) pair,
         UNNEST([pair.type, pair.amount]) value WITH OFFSET
)  PIVOT (ANY_VALUE(value) FOR reward IN ('reward_1_type', 'reward_2_type', 'reward_3_type', 'reward_4_type', 'reward_1_amount', 'reward_2_amount', 'reward_3_amount', 'reward_4_amount'));

输出:

“在此处输入图像描述”

Hope these are helpful.

  1. since a JSON data is dynamic, first step is to find a max reward sequence. (I've used a regular expression and max_reward UDF.)
  2. and then, extract each reward from a json rewards field in an iterative way.
  3. lastly, make the result to be a wide form using PIVOT query.

If you want a more generic solution, you need to use BigQuery dynamic SQL to generate PIVOT columns. I've hard-coded them in the query.

('reward_1', 'reward_2', 'reward_3', 'reward_4')

query:

CREATE TEMP TABLE sample AS 
SELECT 1 AS id, '{"rewards": { "reward_1": { ... ' AS json -- put your json here
 UNION ALL
SELECT 2 AS id, '{"rewards": { "reward_1": { ... ' AS json -- put your another json here
;

CREATE TEMP FUNCTION extract_reward(json STRING, seq INT64)
RETURNS STRUCT<type STRING, amount STRING>
LANGUAGE js AS """
  return JSON.parse(json)['reward_' + seq];
""";

CREATE TEMP FUNCTION max_reward(arr ARRAY<STRING>) AS ((
  SELECT MAX(CAST(v AS INT64)) FROM UNNEST(arr) v
));

SELECT * FROM (
  SELECT id,
         'reward_' || seq AS reward, 
         extract_reward(FORMAT('%t', JSON_QUERY(json, '$.rewards')), seq) AS value
    FROM sample, UNNEST(GENERATE_ARRAY(1, max_reward(REGEXP_EXTRACT_ALL(json, r'"reward_([0-9]+)"')))) seq
)  PIVOT (ANY_VALUE(value) FOR reward IN ('reward_1', 'reward_2', 'reward_3', 'reward_4'));

output:

enter image description here

▶ Split a reward STRUCT column into separate columns

SELECT * FROM (
  SELECT id,
         'reward_' || seq || '_' || IF (offset = 0, 'type', 'amount') AS reward,
         value
    FROM sample,
         UNNEST(GENERATE_ARRAY(1, max_reward(REGEXP_EXTRACT_ALL(json, r'"reward_([0-9]+)"')))) seq, 
         UNNEST([extract_reward(FORMAT('%t', JSON_QUERY(json, '$.rewards')), seq)]) pair,
         UNNEST([pair.type, pair.amount]) value WITH OFFSET
)  PIVOT (ANY_VALUE(value) FOR reward IN ('reward_1_type', 'reward_2_type', 'reward_3_type', 'reward_4_type', 'reward_1_amount', 'reward_2_amount', 'reward_3_amount', 'reward_4_amount'));

output:

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文