在应用时,其中'对于时间戳,直接应用'提取'显示出不同的结果(附加的代码)

发布于 2025-01-20 23:24:06 字数 2263 浏览 0 评论 0 原文

根据任务 5 的 Kaggle 练习(编写查询): https://www.kaggle.com/code/setthawutkulsrisuwan/exercise-as-与 我回答了2种方式:

  1. 使用 WHERE EXTRACT() 查询以获取年份和月份,答案为 不正确
           WITH RelevantRides AS
           (
               SELECT EXTRACT(HOUR from trip_start_timestamp) as hour_of_day, trip_seconds, trip_miles
               FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
               WHERE EXTRACT(YEAR from trip_start_timestamp) = 2017 AND
                     EXTRACT(MONTH from trip_start_timestamp) BETWEEN 1 and 6 AND
                     trip_seconds > 0 AND
                     trip_miles > 0
           )
           SELECT hour_of_day,
                  COUNT(1) as num_trips,
                  3600 * SUM(trip_miles) / SUM(trip_seconds) as avg_mph
           FROM RelevantRides
           GROUP BY hour_of_day
           ORDER BY hour_of_day
  • 使用直接列名查询以获取年份和月份,答案是正确。:
  •                WITH RelevantRides AS
               (
                   SELECT EXTRACT(HOUR from trip_start_timestamp) AS hour_of_day, trip_seconds, trip_miles
                   FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
                   WHERE trip_start_timestamp > '2017-01-01' AND
                         trip_start_timestamp < '2017-07-01' AND
                         trip_seconds > 0 AND
                         trip_miles > 0
               )
               SELECT hour_of_day,
                      COUNT(1) as num_trips,
                      3600 * SUM(trip_miles) / SUM(trip_seconds) as avg_mph
               FROM RelevantRides
               GROUP BY hour_of_day
               ORDER BY hour_of_day
    

    主要区别是第一个是

    WHERE EXTRACT(YEAR from trip_start_timestamp) = 2017 
    AND EXTRACT(MONTH from trip_start_timestamp) BETWEEN 1 and 6
    

    ,第二个是

    WHERE trip_start_timestamp > '2017-01-01' AND
    trip_start_timestamp < '2017-07-01' 
    

    在我看来,它们的结果应该与使用 EXTRACT() 查询相同,显示 2017 年和 1 到 6 月份,与使用直接列名称查询相同;然而,结果并不相同。

    请解释这些背后的原因。 谢谢。

    According to this Kaggle exercise on Task 5 (Write the query):
    https://www.kaggle.com/code/setthawutkulsrisuwan/exercise-as-with
    I answered 2 ways:

    1. Query with WHERE EXTRACT() to get year and month and the answer is
      INCORRECT.:
               WITH RelevantRides AS
               (
                   SELECT EXTRACT(HOUR from trip_start_timestamp) as hour_of_day, trip_seconds, trip_miles
                   FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
                   WHERE EXTRACT(YEAR from trip_start_timestamp) = 2017 AND
                         EXTRACT(MONTH from trip_start_timestamp) BETWEEN 1 and 6 AND
                         trip_seconds > 0 AND
                         trip_miles > 0
               )
               SELECT hour_of_day,
                      COUNT(1) as num_trips,
                      3600 * SUM(trip_miles) / SUM(trip_seconds) as avg_mph
               FROM RelevantRides
               GROUP BY hour_of_day
               ORDER BY hour_of_day
    
    1. Query with the direct column name to get year and month and the answer is CORRECT.:
                   WITH RelevantRides AS
               (
                   SELECT EXTRACT(HOUR from trip_start_timestamp) AS hour_of_day, trip_seconds, trip_miles
                   FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
                   WHERE trip_start_timestamp > '2017-01-01' AND
                         trip_start_timestamp < '2017-07-01' AND
                         trip_seconds > 0 AND
                         trip_miles > 0
               )
               SELECT hour_of_day,
                      COUNT(1) as num_trips,
                      3600 * SUM(trip_miles) / SUM(trip_seconds) as avg_mph
               FROM RelevantRides
               GROUP BY hour_of_day
               ORDER BY hour_of_day
    

    The key differences are that first one is

    WHERE EXTRACT(YEAR from trip_start_timestamp) = 2017 
    AND EXTRACT(MONTH from trip_start_timestamp) BETWEEN 1 and 6
    

    , and the second is

    WHERE trip_start_timestamp > '2017-01-01' AND
    trip_start_timestamp < '2017-07-01' 
    

    .

    In my opinion, they should result the same as querying with EXTRACT() shows year of 2017 and month of 1 to 6 as same as querying with the direct column name; however, results aren't the same.

    Please explain the reasons behind those.
    Thank you.

    如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

    扫码二维码加入Web技术交流群

    发布评论

    需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

    评论(1

    扛刀软妹 2025-01-27 23:24:06

    您正在将恒定日期与时间戳进行比较。恒定日期实际上是类似于 2022-04-07 00:00:00 的时间戳。

    因此,当您想要获取一月到六月日期范围内的所有记录时,您需要:

    WHERE trip_start_timestamp >= '2017-01-01' 
      AND trip_start_timestamp <  '2017-07-01'  
    

    换句话说,您希望该范围第一天午夜或之后的所有记录以及最后一天后一天午夜(但不包括午夜)的所有记录。在数学符号中,您希望日期在 [2017-01-01, 2017-07-01) 范围内。范围的开头是封闭的,结尾是开放的。

    像这样的代码给出了正确的结果。

    WHERE EXTRACT(YEAR from trip_start_timestamp) = 2017 
    AND EXTRACT(MONTH from trip_start_timestamp) BETWEEN 1 and 6
    

    但它无法利用 trip_start_timestamp 列上的索引,因此在生产中效率不高。

    You're comparing constant dates to timestamps. Constant dates are actually timestamps looking like 2022-04-07 00:00:00.

    So when you want to get all records in a date range January to June you need:

    WHERE trip_start_timestamp >= '2017-01-01' 
      AND trip_start_timestamp <  '2017-07-01'  
    

    In other words you want everything on or after midnight on the first day of the range and everything up to but not including midnight on the day after the last day. In mathematical notation you want the dates in the range [2017-01-01, 2017-07-01). The beginning of the range is closed and the end is open.

    Your code like this gives a correct result.

    WHERE EXTRACT(YEAR from trip_start_timestamp) = 2017 
    AND EXTRACT(MONTH from trip_start_timestamp) BETWEEN 1 and 6
    

    But it can't exploit an index on your trip_start_timestamp column, so it won't be efficient in production.

    ~没有更多了~
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文