JOIN ... ON ... JOIN ... on ... on .... VS. ...加入...加入... on ...和
我最近优化了一个看起来像这样的HIVEQL请求(BY 300X):
SELECT * FROM a
LEFT JOIN b
LEFT JOIN c
LEFT JOIN d
ON a.col1 = b.col2 AND
b.col3 = c.col4 AND
c.col5 = d.col6
对此:
SELECT * FROM
a LEFT JOIN b ON a.col1 = b.col2
LEFT JOIN c ON b.col3 = c.col4
LEFT JOIN d ON c.col5 = d.col6
后者代码在SQL中总是更快,还是在HADOOP MAP/减少Hive的操作中可以使用某些东西?
I recently optimized a HiveQL request (by > 300x) that looked something like this:
SELECT * FROM a
LEFT JOIN b
LEFT JOIN c
LEFT JOIN d
ON a.col1 = b.col2 AND
b.col3 = c.col4 AND
c.col5 = d.col6
To this:
SELECT * FROM
a LEFT JOIN b ON a.col1 = b.col2
LEFT JOIN c ON b.col3 = c.col4
LEFT JOIN d ON c.col5 = d.col6
Is the latter code always faster in SQL or does it have something to with the Hadoop Map/Reduce operations in Hive ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论