什么时候并且不使用别名来加入BigQuery

发布于 2025-01-26 01:51:30 字数 743 浏览 5 评论 0原文

我使用了来自公共数据库的Citibike_station和Citibike_Trips,并复制了这些表。 所以我有: 数据集:citibike_stations1 桌子和旅行。

以下是我遇到错误的查询 - “未识别的名称:

选择 st.Station_id, St.Name, number_of_rides为number_of_rides_fromstation,
douatder1- number_of_rides不是列,因此SQL如何选择此
从 ( 选择 start_station_id, 计数(*)number_of_rides 从 Leafy-Racer-348015.citibike_stations1.trips tr start_station_id的组)为station_num_trips 内联 Leafy-Racer-348015.citibike_stations1.stations st on St.Station_id = tr.start_station_id 订购 number_of_rides desc

Doubt2-当我运行此查询时,我会在line st.Station_id = tr.start_station_id中获得未识别的错误。但是,如果我删除此别名,则可以正常工作。

我从Google数据分析课程WK3嵌套查询模块中引用了这个问题。在加入模块的早期,我了解到联接功能起作用是必要的。但是在这里 - 这是相反的。为什么?

I used citibike_stations and citibike_trips from public data base and copied those tables.
So I have :
Dataset: Citibike_stations1
Tables - Stations and Trips.

Below is the query where I get error - "Unrecognized name:

SELECT
st.station_id,
st.name,
number_of_rides AS number_of_rides_fromstation,
doubt1- number_of_rides is not a column so how will SQL select this
FROM (
SELECT
start_station_id,
COUNT(*)number_of_rides
FROM
leafy-racer-348015.citibike_stations1.trips tr
GROUP BY start_station_id ) AS station_num_trips
INNER JOIN
leafy-racer-348015.citibike_stations1.stationsst ON
st.station_id = tr.start_station_id
ORDER BY
number_of_rides DESC

Doubt2 -When I run this query, I get Unrecognized error in the line st.station_id=tr.start_station_id. But if I remove this alias, then it works fine.

I am referring this question from Google Data Analytics Course Wk3 Nested Queries Module. Earlier in the JOIN module, I understood that Aliases are necessary for the Join function to work. But here -it is opposite. Why?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

你如我软肋 2025-02-02 01:51:30

好的,我认为我已经设法解决了这一点。首先,让我们格式化查询,以使其清晰(但仍然有错误):

SELECT 
  st.station_id, 
  st.name, 
  number_of_rides AS number_of_rides_fromstation,
FROM 
  ( 
    SELECT start_station_id, COUNT(*) as number_of_rides 
    FROM leafy-racer-348015.citibike_stations1.trips tr 
    GROUP BY start_station_id 
  ) station_num_trips 

  INNER JOIN leafy-racer-348015.citibike_stations1.stations st 
  ON 
    st.station_id = tr.start_station_id 

ORDER BY number_of_rides DESC

number_of_rides不是列,因此SQL将如何选择此

确定。由station_num_tripsCount(*)生成的子查询。

SQL只是“数据中的矩形块块,矩形数据块” - 每个选择都会产生一个矩形数据块,就像表一样,可以将其送入另一个操作,例如连接,从何处或何处。如果它是单个值,它甚至可以将其馈入选择。您的子查询在此处:

  ( 
    SELECT start_station_id, COUNT(*) as number_of_rides 
    FROM leafy-racer-348015.citibike_stations1.trips tr 
    GROUP BY start_station_id 
  ) station_num_trips 

..在Trips中吸引所有数据,除了站ID之外,将所有列扔掉,计算了该站的旅行数量,并产生了一个新的数据块,该数据仅为<代码>站ID ,Count:它测量2列宽,n行高(n是唯一站ID的数量)。当您加入时,这就是加入的目标;您的2个宽度in N高数据块。

旅行可能有1000行同一站ID和10列;它们都消失了,一行倒入1行,2列和1000列。现在,这就像一个名为station_num_trips的新表的行为

-

在每个步骤列和行都被添加或删除,因此基本上您在SQL中所做的一切都在削减,并将其加入数据块,形成新的数据块。

表通常会形成起始块,但不必这样做:

SELECT * FROM (SELECT 1 as A UNION SELECT 2) x

这是完全有效的;您可以应用其中的子句以删除行,将连接到表格以添加更多列,然后选择这些新列或使用它们来产生更多计算的值,


在子查询中将表毫无偏见。只有一张桌子,所以很明显您所指的是:

  ( 
    SELECT start_station_id, COUNT(*) as number_of_rides 
    FROM leafy-racer-348015.citibike_stations1.trips       --removed alias
    GROUP BY start_station_id 
  ) station_num_trips 

将其辩论也使您犯了另一个错误:

运行此查询时,我会在行中获得未识别的错误st.Station_id = tr.start_station_id。但是,如果我删除此别名,则可以正常工作。

Tr是子查询中的一个别名。它在您使用的地方不存在。整个子查询被称为station_num_trips,因此它是station_num_trips.start_station_id您需要在联接条件中使用,而不是tr.start_station_idtr别名消失了;您无法从包装它的父询问中访问子查询的别名。

但是,您可以在子查询中访问父询问的别名(与您拥有的相比)

SELECT *              
FROM table1 x         <-- this is the parent query
WHERE 
    EXISTS(SELECT null FROM table2 y WHERE x.id = y.id)
                                          ^^^
                                  from the parent query

是一种与主查询数据协调数据协调数据的常见方法。在这里,此查询询问“只能让我x记录table2中有匹配记录的地方” - 它需要子查询与主之间的协调以便


使用凹痕以保持凹痕以保持直视您的脑海中,什么是父母是子查询。小子量可能适合所有线路,或一个紧凑的形式,其中每条线由块关键字引导。应该分散较大的查询:

SELECT
  smiths.columnX,
  counted_things.column3,
  SUM(x.yz)

FROM
  (SELECT * FROM People WHERE name = 'SMITH') smiths

  JOIN (
    SELECT name, age, column3, COUNT(*) as columnX
    FROM TableA JOIN tableB On x = y
    WHERE column4 = 'something'
    GROUP BY name, age, column3
  ) counted_things
  ON 
    counted_things.name = smiths.Name AND 
    counted_things.age = smiths.Age AND
    ..

  JOIN sometable x ON x.whatever = counted_things.whatever

  JOIN anothertable y
  ON
    x.id = y.id AND
    ...

WHERE
  x.thing = 123

GROUP BY
  smiths.columnX,
  counted_things.column3

查看我如何使用凹痕来描述所有内容在哪个级别上运行的级别 - 从组对齐的位置等。它们属于同一查询。数据阻止他的缩进并相互对齐;亚Queries的括号与其他括号和表名称保持一致。挑选一个子查询很容易。很容易看到它加入了什么。 史密斯,counted_things,有些和另一个表的相同,它们将它们连接在一起并在同一级别上运行。亚Queries在thair括号内被缩进,以澄清它们是亚克斯的所有这些,

所有这些都可以帮助您使用诸如“ subqueries可以访问父母的别名,而不是相反的方式”和“ subqueries产生的行和列,这些行和列的行为像新桌子一样””。

OK, I think I've managed to work out what this is asking. First, let's format your query so it's legible (but still has errors):

SELECT 
  st.station_id, 
  st.name, 
  number_of_rides AS number_of_rides_fromstation,
FROM 
  ( 
    SELECT start_station_id, COUNT(*) as number_of_rides 
    FROM leafy-racer-348015.citibike_stations1.trips tr 
    GROUP BY start_station_id 
  ) station_num_trips 

  INNER JOIN leafy-racer-348015.citibike_stations1.stations st 
  ON 
    st.station_id = tr.start_station_id 

ORDER BY number_of_rides DESC

number_of_rides is not a column so how will SQL select this

Sure it is. The subquery aliased by station_num_trips generates it from COUNT(*).

SQL is just "rectagular blocks of data in, rectangular blocks of data out" - every SELECT produces a rectangular block of data, just like a table, that can be fed into another operation, like a JOIN, FROM or WHERE. It can even be fed into a SELECT if it's a single value. Your subquery here:

  ( 
    SELECT start_station_id, COUNT(*) as number_of_rides 
    FROM leafy-racer-348015.citibike_stations1.trips tr 
    GROUP BY start_station_id 
  ) station_num_trips 

..took all the data in trips, threw all the columns away except the station id, counted the number of trips from that station, and produced a new block of data that was just station id, and the count: it measures 2 columns wide by N rows high (N is the number of unique station IDs). This is what gets joined in when you do the join; your 2 wide by N high block of data.

Trips might have had 1000 rows of the same station ID and 10 columns; they're all gone, collapsed into 1 row, 2 columns and a count of 1000 in the one row. This now behaves like a new table called station_num_trips

--

At every step columns and rows are added or removed, so basically all you're ever doing in SQL is cutting up, and joining together blocks of data, to form new blocks of data.

Tables often form the starting blocks, but they don't have to:

SELECT * FROM (SELECT 1 as A UNION SELECT 2) x

This is perfectly valid; you could apply a where clause to remove rows, a JOIN to a table to add more columns and then SELECT those new columns or use them to produce more calculated values


There's no point aliasing your table in the subquery; there is only one table so it's obvious what you're referring to:

  ( 
    SELECT start_station_id, COUNT(*) as number_of_rides 
    FROM leafy-racer-348015.citibike_stations1.trips       --removed alias
    GROUP BY start_station_id 
  ) station_num_trips 

Aliasing it also then led you to another mistake:

When I run this query, I get Unrecognized error in the line st.station_id=tr.start_station_id. But if I remove this alias, then it works fine.

tr is an alias inside the subquery. It doesn't exist where youre using it. The whole subquery is aliased as station_num_trips so it's station_num_trips.start_station_id you need to use in your join condition, not tr.start_station_id. The tr alias is gone; you cannot access an subquery's alias from the parent query that wraps it.

You can, however, access a parent query's alias in a subquery (the other way round from what you have)

SELECT *              
FROM table1 x         <-- this is the parent query
WHERE 
    EXISTS(SELECT null FROM table2 y WHERE x.id = y.id)
                                          ^^^
                                  from the parent query

Doing this is a common way to coordinate a subquery's data with the main query data. Here this query asks "get me only x records where there is a matching record in table2" - it needs that coordination between the subquery and the main in order to work


Make use of indentation to keep straight in your mind what is a parent and what is a subquery. Small subqueries might fit all on one line, or a compact form where each line is lead by the block keyword. Larger queries should be spread out:

SELECT
  smiths.columnX,
  counted_things.column3,
  SUM(x.yz)

FROM
  (SELECT * FROM People WHERE name = 'SMITH') smiths

  JOIN (
    SELECT name, age, column3, COUNT(*) as columnX
    FROM TableA JOIN tableB On x = y
    WHERE column4 = 'something'
    GROUP BY name, age, column3
  ) counted_things
  ON 
    counted_things.name = smiths.Name AND 
    counted_things.age = smiths.Age AND
    ..

  JOIN sometable x ON x.whatever = counted_things.whatever

  JOIN anothertable y
  ON
    x.id = y.id AND
    ...

WHERE
  x.thing = 123

GROUP BY
  smiths.columnX,
  counted_things.column3

See how I'm using indentation to describe what level everything is operating at - the SELECT FROM WHERE GROUP etc are aligned; they belong to the same query. The data blocks int he FROM are indented and aligned with each other; subqueries have parentheses that align with other parentheses and table names. It's easy to pick out a subquery. It's easy to see what it joins to; smiths, counted_things, sometable and anothertable are indented the same, they're joined together and operate at the same level. Subqueries are indented inside thair brackets to clarify that they're subqueries

All this helps you use rules like "subqueries can access parent aliases but not the other way round" and "subqueries produce rows and columns that behave like new tables"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文