为什么需要团体?

发布于 2025-02-04 20:02:05 字数 514 浏览 2 评论 0原文

我正在学习,但是汇总是出现的另一个关键字。但是,为什么需要分组以及确切的分组。

SELECT  
  usertype,
  concat(start_station_name, " to ", end_station_name) AS route,
  count (*) AS num_trips, --counting all trips (gives distinct value?)
  Round(AVG(Cast(tripduration AS int64)/60),2) AS duration --/60 to make into minutes not seconds and the 2 is for decimal place
FROM `bigquery-public-data.new_york_citibike.citibike_trips` 
GROUP BY 
  start_station_name, end_station_name, usertype
Order by 
  num_trips desc
LIMIT 10

I'm learning, but aggregated was another keyword that came up. But why does it need to be grouped and what exactly gets grouped.

SELECT  
  usertype,
  concat(start_station_name, " to ", end_station_name) AS route,
  count (*) AS num_trips, --counting all trips (gives distinct value?)
  Round(AVG(Cast(tripduration AS int64)/60),2) AS duration --/60 to make into minutes not seconds and the 2 is for decimal place
FROM `bigquery-public-data.new_york_citibike.citibike_trips` 
GROUP BY 
  start_station_name, end_station_name, usertype
Order by 
  num_trips desc
LIMIT 10

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

能怎样 2025-02-11 20:02:05

这就是查询所做的(从概念上 - 实际实现可能会有所不同)

  • 从表中获取所有行。
  • 根据start_station_nameend_station_nameusertype对它们进行分组。换句话说,对于起始站,终点站和用户类型的每种不同组合,我们都有一个单独的组。例如,如果有3个位置(abc)和2个用户类型(12),假设没有一个位置到自身的旅行,您会看到这样的组:
    • locationa,locationb,usertype1
    • locationb,locationa,usertype1
    • locationb,locationb,usertype1
    • Locationc,位置B,USERTYPE1
    • locationa,locationb,usertype2
    • Locationb,位置A,USERTYPE2
    • locationb,locationb,usertype2
    • locationc,locationb,usertype2
  • ,两个值 - 计算 - 计算 - num_trips,仅是组中的行数,持续时间,这是所有行的tripduration/60的舍入平均值在小组中。
  • 通过num_trips对组(不是组内的行,而是组本身的行)进行排序。
  • 仅输出这些组的前10个(分类后)。

This is what the query does (conceptually - the actual implementation may vary)

  • Get all rows from the table.
  • Group them according to start_station_name, end_station_name and usertype. In other words, for every different combination of start station, end station, and user type, we have a separate group. For example, if there were 3 locations (A, B and C) and 2 user types (1 and 2), and assuming there were no trips from a location to itself, you would see groups like these:
    • LocationA, LocationB, UserType1
    • LocationB, LocationA, UserType1
    • LocationB, LocationC, UserType1
    • LocationC, LocationB, UserType1
    • LocationA, LocationB, UserType2
    • LocationB, LocationA, UserType2
    • LocationB, LocationC, UserType2
    • LocationC, LocationB, UserType2
  • For each of these groups, calculate two values - num_trips, which is just the number of rows in the group, and duration, which is the rounded average of tripduration/60 for all the rows in the group.
  • Sort the groups (not the rows within the groups, but the groups themselves) by num_trips.
  • Output only the first 10 of those groups (after sorting).
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文