hive：压缩运行多长时间？

发布于 2025-01-11 13:06:53 字数 1617 浏览 0 评论 0原文

蜂巢版本：3.1.0.3.1.4.0-315 Spark 版本：2.3.2.3.1.4.0-315

基本上，我正在尝试从 Spark 读取事务表数据。根据此页面[https://stackoverflow.com/questions/50254590/how-to-read-orc-transaction-hive-table-in-spark][1]，发现必须压缩事务表。因此，我想尝试这种方法。

我对此很陌生，正在尝试压缩增量文件，但它总是显示“已启动”并且从未完成。主要压缩和次要压缩都会发生这种情况。任何帮助将不胜感激。

我想知道这是否是一个好的方法。
另外，除了显示压缩之外，如何监视压缩作业过程？我只能从 hiveserver_stdout.log 中看到“Compaction enqueued with id 1”这一行。
一般来说，这个压缩需要多长时间才能完成？
有什么办法可以停止压缩吗？

TIA。

[编辑]

显示压实；

+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
| compactionid  |  dbname   |    tabname     |    partname    |  type  |   state    | workerid  |  starttime  |   duration    | hadoopjobid  |
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
| CompactionId  | Database  | Table          | Partition      | Type   | State      | Worker    | Start Time  | Duration(ms)  | HadoopJobId  |
| 1             | tmp       | shop_na2       | dt=2014-00-00  | MAJOR  | initiated  |  ---      |  ---        |  ---          |  ---         |
| 2             | tmp       | na2_check      | dt=2014-00-00  | MINOR  | initiated  |  ---      |  ---        |  ---          |  ---         |
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
3 rows selected (0.408 seconds)

尽管保留期设置为 86400 秒，但过去 36 小时内仍显示相同的压缩结果。

原文

Hive version: 3.1.0.3.1.4.0-315
spark version: 2.3.2.3.1.4.0-315

Basically, i am trying to read transactional table data from spark. As per this page [https://stackoverflow.com/questions/50254590/how-to-read-orc-transaction-hive-table-in-spark][1], found that transactional table has to be compacted. Hence, i want to try this approach.

I am new to this and was trying compaction on delta files but it always shows "initiated" and never complete.
This is happening for both Major and Minor compaction. Any help will be highly appreciated.

I want to know whether is this good approach.
Also, how to monitor the compaction job process other than show compactions? i can only see the line "Compaction enqueued with id 1" from the hiveserver_stdout.log.
Generally, how long does this compaction takes to complete?
is there any way to stop the compactions?

TIA.

[Edited]

SHOW COMPACTIONS;

+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
| compactionid  |  dbname   |    tabname     |    partname    |  type  |   state    | workerid  |  starttime  |   duration    | hadoopjobid  |
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
| CompactionId  | Database  | Table          | Partition      | Type   | State      | Worker    | Start Time  | Duration(ms)  | HadoopJobId  |
| 1             | tmp       | shop_na2       | dt=2014-00-00  | MAJOR  | initiated  |  ---      |  ---        |  ---          |  ---         |
| 2             | tmp       | na2_check      | dt=2014-00-00  | MINOR  | initiated  |  ---      |  ---        |  ---          |  ---         |
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
3 rows selected (0.408 seconds)

The same compactions result has been showing for past 36 hours, though retention period has been set as 86400 sec.

分享到QQ

分享到微博