有没有办法在使用Impala中使用该子查询时首先解决子查询
我正在尝试使用Impala中的子查询修剪一些分区。
在下面的查询中,我将日期进行了编码,我得到了预期的修剪& Impala只是阅读相关分区。
select col1, col2
from table1
where
table1PartitionCol >= '2022-05-01';
(Table1)从解释计划中:分区= 100/5000文件= 100 size = 10GB
但是,当我尝试从子查询中获得日期(查询其他表)时,解释计划表明它显示了首先读取每个分区,然后运行子查询&将其应用于过滤器。
select col1, col2
from table1
where
table1PartitionCol >= (select max(partitionValue) from table2);
(Table1)从解释计划中:分区= 5000/5000文件= 5000 size = 250GB
理想情况下,它将首先运行子查询&然后从主表中阅读。有没有办法强迫它这样做?还是其他一些实现相同结果的方法?
I am trying to prune some partitions using a subquery in Impala.
In the query below, where I hardcode the date, I get the expected pruning & impala just reads the relevant partitions.
select col1, col2
from table1
where
table1PartitionCol >= '2022-05-01';
(Table1) From Explain Plan: partitions=100/5000 files=100 size=10GB
However, when I try to get the date from a subquery(querying a different table), the explain plan shows that it reads every partition first, then it will run the subquery & apply it as a filter.
select col1, col2
from table1
where
table1PartitionCol >= (select max(partitionValue) from table2);
(Table1) From Explain Plan: partitions=5000/5000 files=5000 size=250GB
Ideally, it would run the subquery first & then read from the main table. Is there a way to force it to do this? Or some other way to achieve the same results?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论