当前位置：文江博客话题详情

group-by hive

基于基数优化的蜂巢组

发布于 2025-01-20 03:23:30 字数 767 浏览 0 评论 0 原文

从操作进行组时，列的逻辑上的基数应该很重要。当我们编写涉及组组的Hive查询时，由于我们熟悉了要查询的数据，因此我们对 组中涉及的单个列的基数有一个想法。但是Hive对此一无所知。因此，假设蜂巢查询是： -

SELECT Col1,Col2,Col3,Col4,Col5,COUNT(*) FROM MyTable GROUP BY Col1,Col2,Col3,Col4,Col5

我知道这里所有5列的基数程度。但是Hive不知道，所以Hive可能会表现出最糟糕的情况。

因此，假设我对这些列的基数信息是这样的，从最低到最高，也给出包含的值的示例：-col5

=它包含country Name
col4 =它包含状态名称
col3 =它包含城市名称
col2 = it包含邮政编码
col1 =它包含电子邮件地址，

现在Hive将所有这些都一样，如果Hive知道潜在的基础性信息，那么它可以利用这一点来计算独特的群体，这是否是有益的？在这种情况下，如果我按照基数顺序明确地将列在组中的子句中排列，那么如下示例所示，它是否有效？

SELECT Col1,Col2,Col3,Col4,Col5,COUNT(*) FROM MyTable GROUP BY Col5,Col4,Col3,Col2,Col1

还是Hive将忽略此顺序，而不管订单如何，都同样对待所有列？

原文

Logically cardinality of columns should matter while doing GROUP BY operation. When we write Hive queries involving GROUP BY, since we are familiar with the data being queried, we have an idea about cardinality of individual columns involved in the GROUP BY. But Hive has no idea about this. So let's say the Hive query in question is:-

SELECT Col1,Col2,Col3,Col4,Col5,COUNT(*) FROM MyTable GROUP BY Col1,Col2,Col3,Col4,Col5

I know the degree of cardinality of all the 5 columns here. But Hive doesn't know that, so Hive will probably perform the worst.

So let's say the cardinality information that I have about these columns is like this, from lowest to highest and also giving example of values contained:-

Col5 = it contains country name
Col4 = it contains state name
Col3 = it contains city name
Col2 = it contains postal code
Col1 = it contains email address

Now Hive will treat all these the same , won't it be beneficial if Hive knew about underlying cardinality information so it could exploit this in calculating unique groups? In that case if I explicitly arrange the columns in the GROUP BY clause in the order of cardinality, will it be efficient as shown in the following example ?

SELECT Col1,Col2,Col3,Col4,Col5,COUNT(*) FROM MyTable GROUP BY Col5,Col4,Col3,Col2,Col1

Or hive will ignore this order and treat all the columns equally regardless of the order?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

关于作者

离笑几人歌

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

基于基数优化的蜂巢组

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

基于基数优化的蜂巢组

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。