标准SQL:重写一个明确的交叉加入条款

发布于 2025-01-25 08:17:48 字数 2245 浏览 4 评论 0原文

考虑一个表交易具有两个JSONB字段输出输入。 问题是,如何使用with a子句重写此查询?

-- Note: This query will process 111.85 MB when run.
SELECT
    transactions.hash AS CREATED_TX_HASH,
    transactions.block_number AS CREATED_BLOCK_ID,
    transactions.block_timestamp AS CREATED_BLOCK_TIME,
    outputs.index AS CREATED_INDEX,
    outputs.value / 1e8 AS OUTPUT_VALUE_BTC,
    transactions.hash AS SPENT_CREATED_TX_HASH,
    transactions.block_number AS SPENDING_BLOCK_ID,
    transactions.block_timestamp AS SPENDING_BLOCK_TIME,
    inputs.index AS SPENT_CREATED_INDEX,
    inputs.spent_transaction_hash as SPENDING_TX_HASH,
    inputs.spent_output_index AS SPENDING_INDEX,
    inputs.value / 1e8 AS INPUT_VALUE_BTC
FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions
CROSS JOIN
    transactions.outputs as outputs
CROSS JOIN
    transactions.inputs as inputs
-- FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
--     transactions.outputs as outputs,
--     transactions.inputs as inputs   
WHERE transactions.block_timestamp_month < '2009-02-01' 
ORDER BY 3

我需要创建CTE,以保持临时结果集如下:

WITH outputs AS (
  SELECT
      transactions.hash AS CREATED_TX_HASH,
      transactions.block_number AS CREATED_BLOCK_ID,
      transactions.block_timestamp AS CREATED_BLOCK_TIME,
      outputs.index AS CREATED_INDEX,
      outputs.value / 1e8 AS OUTPUT_VALUE_BTC
  FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
      transactions.outputs as outputs
  WHERE transactions.block_timestamp_month < '2009-02-01'  

), inputs AS (

  SELECT
    transactions.hash AS SPENT_CREATED_TX_HASH,
    transactions.block_number AS SPENDING_BLOCK_ID,
    transactions.block_timestamp AS SPENDING_BLOCK_TIME,
    inputs.index AS SPENT_CREATED_INDEX,
    inputs.spent_transaction_hash as SPENDING_TX_HASH,
    inputs.spent_output_index AS SPENDING_INDEX,
    inputs.value / 1e8 AS INPUT_VALUE_BTC
  FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
      transactions.inputs as inputs
  WHERE transactions.block_timestamp_month < '2009-02-01'
)

但是我不知道这两个CTE上的哪个SELECT语句与上面的原始查询产生相同的结果。

Consider a table transactions which has two JSONB fields outputs and inputs.
The question is how can one rewrite this query using a WITH clause?

-- Note: This query will process 111.85 MB when run.
SELECT
    transactions.hash AS CREATED_TX_HASH,
    transactions.block_number AS CREATED_BLOCK_ID,
    transactions.block_timestamp AS CREATED_BLOCK_TIME,
    outputs.index AS CREATED_INDEX,
    outputs.value / 1e8 AS OUTPUT_VALUE_BTC,
    transactions.hash AS SPENT_CREATED_TX_HASH,
    transactions.block_number AS SPENDING_BLOCK_ID,
    transactions.block_timestamp AS SPENDING_BLOCK_TIME,
    inputs.index AS SPENT_CREATED_INDEX,
    inputs.spent_transaction_hash as SPENDING_TX_HASH,
    inputs.spent_output_index AS SPENDING_INDEX,
    inputs.value / 1e8 AS INPUT_VALUE_BTC
FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions
CROSS JOIN
    transactions.outputs as outputs
CROSS JOIN
    transactions.inputs as inputs
-- FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
--     transactions.outputs as outputs,
--     transactions.inputs as inputs   
WHERE transactions.block_timestamp_month < '2009-02-01' 
ORDER BY 3

What I need is to create CTEs in order to keep temporary result sets as below:

WITH outputs AS (
  SELECT
      transactions.hash AS CREATED_TX_HASH,
      transactions.block_number AS CREATED_BLOCK_ID,
      transactions.block_timestamp AS CREATED_BLOCK_TIME,
      outputs.index AS CREATED_INDEX,
      outputs.value / 1e8 AS OUTPUT_VALUE_BTC
  FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
      transactions.outputs as outputs
  WHERE transactions.block_timestamp_month < '2009-02-01'  

), inputs AS (

  SELECT
    transactions.hash AS SPENT_CREATED_TX_HASH,
    transactions.block_number AS SPENDING_BLOCK_ID,
    transactions.block_timestamp AS SPENDING_BLOCK_TIME,
    inputs.index AS SPENT_CREATED_INDEX,
    inputs.spent_transaction_hash as SPENDING_TX_HASH,
    inputs.spent_output_index AS SPENDING_INDEX,
    inputs.value / 1e8 AS INPUT_VALUE_BTC
  FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
      transactions.inputs as inputs
  WHERE transactions.block_timestamp_month < '2009-02-01'
)

But I do not know which SELECT statement on these two CTEs produces the same result as the original query above.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

私野 2025-02-01 08:17:48

您需要通过create_block_idspending_block_id加入它们,此外,我使用了row_number语句来避免重复的值。

以下查询应该对您有效:

    WITH outputs AS (
  SELECT
      transactions.hash AS CREATED_TX_HASH,
      transactions.block_number AS CREATED_BLOCK_ID,
      transactions.block_timestamp AS CREATED_BLOCK_TIME,
      outputs.index AS CREATED_INDEX,
      outputs.value / 1e8 AS OUTPUT_VALUE_BTC
  FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
      transactions.outputs as outputs
  WHERE transactions.block_timestamp_month < '2009-02-01'  

), inputs AS (

  SELECT
    transactions.hash AS SPENT_CREATED_TX_HASH,
    transactions.block_number AS SPENDING_BLOCK_ID,
    transactions.block_timestamp AS SPENDING_BLOCK_TIME,
    inputs.index AS SPENT_CREATED_INDEX,
    inputs.spent_transaction_hash as SPENDING_TX_HASH,
    inputs.spent_output_index AS SPENDING_INDEX,
    inputs.value / 1e8 AS INPUT_VALUE_BTC
  FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
      transactions.inputs as inputs
  WHERE transactions.block_timestamp_month < '2009-02-01'
)
SELECT * from 
(
  SELECT * , 
  ROW_NUMBER() OVER(PARTITION BY CREATED_BLOCK_ID, CREATED_INDEX, SPENDING_BLOCK_ID, SPENT_CREATED_INDEX, CREATED_TX_HASH, SPENT_CREATED_TX_HASH
                    ORDER BY CREATED_BLOCK_TIME DESC) as last
   from outputs o join inputs i 
on  o.CREATED_BLOCK_ID=SPENDING_BLOCK_ID 
order by o.CREATED_BLOCK_ID, o.CREATED_BLOCK_TIME, o.CREATED_INDEX, o.CREATED_TX_HASH  
)
WHERE last = 1 AND CREATED_TX_HASH = SPENT_CREATED_TX_HASH

输出看起来像:

“在此处输入图像说明”

最后,我建议您使用cross join查询,因为此功能的性能要比使用更好使用子句使用的子查询。

You'll need to join them by the CREATED_BLOCK_ID and the SPENDING_BLOCK_ID, Additionally I used the ROW_NUMBER statement to avoid duplicated values.

Below Query should work fine for you:

    WITH outputs AS (
  SELECT
      transactions.hash AS CREATED_TX_HASH,
      transactions.block_number AS CREATED_BLOCK_ID,
      transactions.block_timestamp AS CREATED_BLOCK_TIME,
      outputs.index AS CREATED_INDEX,
      outputs.value / 1e8 AS OUTPUT_VALUE_BTC
  FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
      transactions.outputs as outputs
  WHERE transactions.block_timestamp_month < '2009-02-01'  

), inputs AS (

  SELECT
    transactions.hash AS SPENT_CREATED_TX_HASH,
    transactions.block_number AS SPENDING_BLOCK_ID,
    transactions.block_timestamp AS SPENDING_BLOCK_TIME,
    inputs.index AS SPENT_CREATED_INDEX,
    inputs.spent_transaction_hash as SPENDING_TX_HASH,
    inputs.spent_output_index AS SPENDING_INDEX,
    inputs.value / 1e8 AS INPUT_VALUE_BTC
  FROM `bigquery-public-data.crypto_bitcoin.transactions` as transactions,
      transactions.inputs as inputs
  WHERE transactions.block_timestamp_month < '2009-02-01'
)
SELECT * from 
(
  SELECT * , 
  ROW_NUMBER() OVER(PARTITION BY CREATED_BLOCK_ID, CREATED_INDEX, SPENDING_BLOCK_ID, SPENT_CREATED_INDEX, CREATED_TX_HASH, SPENT_CREATED_TX_HASH
                    ORDER BY CREATED_BLOCK_TIME DESC) as last
   from outputs o join inputs i 
on  o.CREATED_BLOCK_ID=SPENDING_BLOCK_ID 
order by o.CREATED_BLOCK_ID, o.CREATED_BLOCK_TIME, o.CREATED_INDEX, o.CREATED_TX_HASH  
)
WHERE last = 1 AND CREATED_TX_HASH = SPENT_CREATED_TX_HASH

The output looks like:

enter image description here

Finally I would recommend you to use the CROSS JOIN query since this function has a better performance than the use of subqueries using the WITH clause.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文