基于基数优化的蜂巢组

发布于 2025-01-20 03:23:30 字数 767 浏览 0 评论 0 原文

从操作进行组时,列的逻辑上的基数应该很重要。当我们编写涉及组的Hive查询时,由于我们熟悉了要查询的数据,因此我们对 组中涉及的单个列的基数有一个想法。但是Hive对此一无所知。因此,假设蜂巢查询是: -

SELECT Col1,Col2,Col3,Col4,Col5,COUNT(*) FROM MyTable GROUP BY Col1,Col2,Col3,Col4,Col5

我知道这里所有5列的基数程度。但是Hive不知道,所以Hive可能会表现出最糟​​糕的情况。

因此,假设我对这些列的基数信息是这样的,从最低到最高,也给出包含的值的示例:-col5

  • =它包含country Name
  • col4 =它包含状态名称
  • col3 =它包含城市名称
  • col2 = it包含邮政编码
  • col1 =它包含电子邮件地址,

现在Hive将所有这些都一样,如果Hive知道潜在的基础性信息,那么它可以利用这一点来计算独特的群体,这是否是有益的?在这种情况下,如果我按照基数顺序明确地将列在组中的子句中排列,那么如下示例所示,它是否有效?

SELECT Col1,Col2,Col3,Col4,Col5,COUNT(*) FROM MyTable GROUP BY Col5,Col4,Col3,Col2,Col1

还是Hive将忽略此顺序,而不管订单如何,都同样对待所有列?

Logically cardinality of columns should matter while doing GROUP BY operation. When we write Hive queries involving GROUP BY, since we are familiar with the data being queried, we have an idea about cardinality of individual columns involved in the GROUP BY. But Hive has no idea about this. So let's say the Hive query in question is:-

SELECT Col1,Col2,Col3,Col4,Col5,COUNT(*) FROM MyTable GROUP BY Col1,Col2,Col3,Col4,Col5

I know the degree of cardinality of all the 5 columns here. But Hive doesn't know that, so Hive will probably perform the worst.

So let's say the cardinality information that I have about these columns is like this, from lowest to highest and also giving example of values contained:-

  • Col5 = it contains country name
  • Col4 = it contains state name
  • Col3 = it contains city name
  • Col2 = it contains postal code
  • Col1 = it contains email address

Now Hive will treat all these the same , won't it be beneficial if Hive knew about underlying cardinality information so it could exploit this in calculating unique groups? In that case if I explicitly arrange the columns in the GROUP BY clause in the order of cardinality, will it be efficient as shown in the following example ?

SELECT Col1,Col2,Col3,Col4,Col5,COUNT(*) FROM MyTable GROUP BY Col5,Col4,Col3,Col2,Col1

Or hive will ignore this order and treat all the columns equally regardless of the order?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文