hive:压缩运行多长时间?
蜂巢版本:3.1.0.3.1.4.0-315 Spark 版本:2.3.2.3.1.4.0-315
基本上,我正在尝试从 Spark 读取事务表数据。根据此页面[https://stackoverflow.com/questions/50254590/how-to-read-orc-transaction-hive-table-in-spark][1],发现必须压缩事务表。因此,我想尝试这种方法。
我对此很陌生,正在尝试压缩增量文件,但它总是显示“已启动”并且从未完成。 主要压缩和次要压缩都会发生这种情况。任何帮助将不胜感激。
- 我想知道这是否是一个好的方法。
- 另外,除了显示压缩之外,如何监视压缩作业过程?我只能从 hiveserver_stdout.log 中看到“Compaction enqueued with id 1”这一行。
- 一般来说,这个压缩需要多长时间才能完成?
- 有什么办法可以停止压缩吗?
TIA。
[编辑]
显示压实;
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
| compactionid | dbname | tabname | partname | type | state | workerid | starttime | duration | hadoopjobid |
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
| CompactionId | Database | Table | Partition | Type | State | Worker | Start Time | Duration(ms) | HadoopJobId |
| 1 | tmp | shop_na2 | dt=2014-00-00 | MAJOR | initiated | --- | --- | --- | --- |
| 2 | tmp | na2_check | dt=2014-00-00 | MINOR | initiated | --- | --- | --- | --- |
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
3 rows selected (0.408 seconds)
尽管保留期设置为 86400 秒,但过去 36 小时内仍显示相同的压缩结果。
Hive version: 3.1.0.3.1.4.0-315
spark version: 2.3.2.3.1.4.0-315
Basically, i am trying to read transactional table data from spark. As per this page [https://stackoverflow.com/questions/50254590/how-to-read-orc-transaction-hive-table-in-spark][1], found that transactional table has to be compacted. Hence, i want to try this approach.
I am new to this and was trying compaction on delta files but it always shows "initiated" and never complete.
This is happening for both Major and Minor compaction. Any help will be highly appreciated.
- I want to know whether is this good approach.
- Also, how to monitor the compaction job process other than show compactions? i can only see the line "Compaction enqueued with id 1" from the hiveserver_stdout.log.
- Generally, how long does this compaction takes to complete?
- is there any way to stop the compactions?
TIA.
[Edited]
SHOW COMPACTIONS;
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
| compactionid | dbname | tabname | partname | type | state | workerid | starttime | duration | hadoopjobid |
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
| CompactionId | Database | Table | Partition | Type | State | Worker | Start Time | Duration(ms) | HadoopJobId |
| 1 | tmp | shop_na2 | dt=2014-00-00 | MAJOR | initiated | --- | --- | --- | --- |
| 2 | tmp | na2_check | dt=2014-00-00 | MINOR | initiated | --- | --- | --- | --- |
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
3 rows selected (0.408 seconds)
The same compactions result has been showing for past 36 hours, though retention period has been set as 86400 sec.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
建议在集群负载较小时执行此操作,也许在周末运行的作业较少时启动,这是一个资源密集型操作,时间量取决于数据,但会跨越适度数量的增量多个小时。您可以使用查询 SHOW COMPACTIONS;获取压缩状态的更新,包括以下详细信息
数据库名称
表名称
分区名称
主要或次要压缩
压缩状态:
已启动 - 在队列中等待
工作 - 当前正在压缩
准备清理 - 压缩已完成,计划删除旧文件
线程 ID
开始压实时间
It is advised to perform this operation when the load on the cluster is less, maybe initiate over a weekend when there are less jobs running, it is a resource intensive operation and amount of time depends on the data but a moderate quantity of deltas would span multiple hours. You can use the query SHOW COMPACTIONS; to get an update on the status of compaction including the following details
Database name
Table name
Partition name
Major or minor compaction
Compaction state:
Initiated - waiting in queue
Working - currently compacting
Ready for cleaning - compaction completed and old files scheduled for removal
Thread ID
Start time of compaction