Hive加入表并创建2个不同的表格
我正在使用Hive .14进行学校项目(因此不能使用更好的版本),目前我分析了纽约Uber获取的数据。我使用2个表:YellowTaxi22,以获取有关Uber旅行的信息(在这种情况下,我将仅使用pickupid和Dropoff ID);和ZoneInfo,只有拾音器和Dropoff ID的名称。我创建了一个查询,以查找Uber所采用的最常见的旅行,然后从中创建了一个视图(这是常见的),现在的问题是,当我想将ID更改为其相应名称时,只有选择pickupid = dropid的旅行。示例
pupocationId | doLatiatiD | count |
---|---|---|
12 | 34 | 20000 |
43 | 12 | 30000 |
12 | 12 | 100 |
5 | 91 | 40000 |
34 | 34 | 70000 |
为
pulocationId | doLatoctiDId doLatoctiDId | Ind |
---|---|---|
jfk机场 | NV | 20000 |
Midtown | JFK机场 | JFK |
机场 | JFK | 100 |
机场 | JFK机场 | 应 |
30000 | 机场 | NV |
结果 是
putocationId | doLatiatiD | count |
---|---|---|
jfk机场 | 肯尼迪机场 | 100 |
NV | NV NV | 70000 |
这是我使用的查询,Zoneinfo是包含位置名称(称为区域)及其ID的表格(可以在拾取或可以使用) Dropoff列(或两者兼有),常见的旅行是我用来计算一次旅行次数的视图的名称,该计数在查询中被称为“旅行”:
select /*+ Mapjoin(commontrip)*/ zoneinfo.zone as pickup, zoneinfo.zone as dropoff, commontrip.trips
from zoneinfo join commontrip on
(zoneinfo.locationid=commontrip.pulocationid and zoneinfo.locationid=commontrip.dolocationid);
I am using hive .14 for a school project (so cant use a better version), and currently I analysing data taken from a uber in NY. I am using 2 tables: yellowtaxi22, for the information regarding the trip the uber did (in this case I will only use the pickupid and the dropoff id); and zoneinfo, that simply have the names of the pickup and dropoff id. I created a query to find what where the most common trips the uber took and then I created a view out of it(this being the commontrip), the problem now is that when I want to change the ids to their corresponding names, it only selects the trips where the pickupid=dropid. Example
pulocationid | dolocationid | count |
---|---|---|
12 | 34 | 20000 |
43 | 12 | 30000 |
12 | 12 | 100 |
5 | 91 | 40000 |
34 | 34 | 70000 |
the result should be
pulocationid | dolocationid | count |
---|---|---|
JFK airport | NV | 20000 |
Midtown | JFK airport | 30000 |
JFK airport | JFK airport | 100 |
Mountain Hill | LINCON SQR | 40000 |
NV | NV | 70000 |
But the result is
pulocationid | dolocationid | count |
---|---|---|
JFK airport | JFK airport | 100 |
NV | NV | 70000 |
this is the query that I am using, zoneinfo is the table that contains the location´s name (refered as zone) and its id (whcih can be used in either the pickup or the dropoff column (or both), common trips is the name of the view that I used to count the number of times a trip was repeated, this count is named as "trips" in the query:
select /*+ Mapjoin(commontrip)*/ zoneinfo.zone as pickup, zoneinfo.zone as dropoff, commontrip.trips
from zoneinfo join commontrip on
(zoneinfo.locationid=commontrip.pulocationid and zoneinfo.locationid=commontrip.dolocationid);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您正在将
Zoneinfo.lativeID
与commonTrip.pulocationId
和commontrip.dolocationID
进行比较。尝试使用“标签表”一次仅加入一个ID列
You're comparing
zoneinfo.locationid
to bothcommontrip.pulocationid
andcommontrip.dolocationid
, meaning those must be equal with each other as well.Try joining only one ID column at a time with your "label table"