Hive加入表并创建2个不同的表格

发布于 2025-02-09 02:32:55 字数 1706 浏览 2 评论 0原文

我正在使用Hive .14进行学校项目(因此不能使用更好的版本),目前我分析了纽约Uber获取的数据。我使用2个表:YellowTaxi22,以获取有关Uber旅行的信息(在这种情况下,我将仅使用pickupid和Dropoff ID);和ZoneInfo,只有拾音器和Dropoff ID的名称。我创建了一个查询,以查找Uber所采用的最常见的旅行,然后从中创建了一个视图(这是常见的),现在的问题是,当我想将ID更改为其相应名称时,只有选择pickupid = dropid的旅行。示例

pupocationIddoLatiatiDcount
123420000
431230000
1212100
59140000
343470000

pulocationIddoLatoctiDId doLatoctiDIdInd
jfk机场NV20000
MidtownJFK机场JFK
机场JFK100
机场JFK机场
30000机场NV

结果 是

putocationIddoLatiatiDcount
jfk机场肯尼迪机场100
NVNV NV70000

这是我使用的查询,Zoneinfo是包含位置名称(称为区域)及其ID的表格(可以在拾取或可以使用) Dropoff列(或两者兼有),常见的旅行是我用来计算一次旅行次数的视图的名称,该计数在查询中被称为“旅行”:

select /*+ Mapjoin(commontrip)*/ zoneinfo.zone as pickup, zoneinfo.zone as dropoff, commontrip.trips 
from zoneinfo join commontrip on
(zoneinfo.locationid=commontrip.pulocationid and zoneinfo.locationid=commontrip.dolocationid);

I am using hive .14 for a school project (so cant use a better version), and currently I analysing data taken from a uber in NY. I am using 2 tables: yellowtaxi22, for the information regarding the trip the uber did (in this case I will only use the pickupid and the dropoff id); and zoneinfo, that simply have the names of the pickup and dropoff id. I created a query to find what where the most common trips the uber took and then I created a view out of it(this being the commontrip), the problem now is that when I want to change the ids to their corresponding names, it only selects the trips where the pickupid=dropid. Example

pulocationiddolocationidcount
123420000
431230000
1212100
59140000
343470000

the result should be

pulocationiddolocationidcount
JFK airportNV20000
MidtownJFK airport30000
JFK airportJFK airport100
Mountain HillLINCON SQR40000
NVNV70000

But the result is

pulocationiddolocationidcount
JFK airportJFK airport100
NVNV70000

this is the query that I am using, zoneinfo is the table that contains the location´s name (refered as zone) and its id (whcih can be used in either the pickup or the dropoff column (or both), common trips is the name of the view that I used to count the number of times a trip was repeated, this count is named as "trips" in the query:

select /*+ Mapjoin(commontrip)*/ zoneinfo.zone as pickup, zoneinfo.zone as dropoff, commontrip.trips 
from zoneinfo join commontrip on
(zoneinfo.locationid=commontrip.pulocationid and zoneinfo.locationid=commontrip.dolocationid);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

傲性难收 2025-02-16 02:32:55

您正在将Zoneinfo.lativeIDcommonTrip.pulocationIdcommontrip.dolocationID进行比较。

尝试使用“标签表”一次仅加入一个ID列

You're comparing zoneinfo.locationid to both commontrip.pulocationidand commontrip.dolocationid, meaning those must be equal with each other as well.

Try joining only one ID column at a time with your "label table"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文