Hive ORC 表中的 Presto 数据类型不匹配问题
我正在尝试通过 presto 查询我的 hive orc 表,在 Hive 中其工作正常。在 prestro 中,我能够访问除 lowrange 之外的所有列,它显示以下错误,
error : Query 20220322_135856_00076_a33ec failed: Error opening Hive split hdfs://.....filename.orc
(offset=0, length=24216): Malformed ORC file. Cannot read SQL type varchar from ORC stream .lowrange
of type LONG [hdfs://.....filename.orc.orc]
在开始查询之前我已在 presto 中设置了以下属性:
set hive1.orc.use-column-names=true
其中 hive1 是我的目录名称。
我还尝试将此列的 Hive 表数据类型更改为 Double/BigInt,Int 但没有任何效果。
有人可以帮我解决该错误吗?
表描述:
+-------------------------------+---------------------------------------------------------+-----------------------+--+
| col_name | data_type | comment |
+-------------------------------+---------------------------------------------------------+-----------------------+--+
| # col_name | data_type | comment |
| | NULL | NULL |
| lowrange | string | |
| type | string | |
| processed_date | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| type | string | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | test | NULL |
| Owner: | hdfs | NULL |
| CreateTime: | Tue Mar 22 08:28:49 UTC 2022 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://......../user/hdfs/test/ | NULL |
| Table Type: | EXTERNAL_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | EXTERNAL | TRUE |
| | skip.header.line.count | 1 |
| | transient_lastDdlTime | 1647937729 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL |
| InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | field.delim | , |
| | serialization.format | , |
+-------------------------------+---------------------------------------------------------+-----------------------+--+
Sample Data:
lowrange type processed_date
1234567890001212 01 20220323
1234567890001213 01 20220323
表创建语句:
CREATE EXTERNAL TABLE `table1`(
`lowrange` string,
`processed_date` string)
PARTITIONED BY (
`type` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://......./user/hdfs/test'
TBLPROPERTIES (
'skip.header.line.count'='1')
更新:我删除了现有表并在 hive 表中创建了数据类型从 String 到 BigINt 的新表,并且能够从表中选择数据,但是当我尝试执行 lpad 操作时,它再次显示相同的问题。
Logic I want to apply on Field : lpad(lowrange ,13,'9')
Error: Unexpected parameters (bigint, integer, varchar(1)) for
function lpad. Expected: lpad(varchar(x), bigint, varchar(y))
然后我尝试使用以下查询将 bigint 转换为 varchar:
Updated Logic : lpad(cast(lowrange as varchar),13,'9'))
Error:
Malformed ORC file. Cannot read SQL type bigint
from ORC stream .lowrange of type STRING
I am trying to query my hive orc table by presto ,In Hive its working Fine.In prestro I am able to access all the column except lowrange It's showing Below Erroe
error : Query 20220322_135856_00076_a33ec failed: Error opening Hive split hdfs://.....filename.orc
(offset=0, length=24216): Malformed ORC file. Cannot read SQL type varchar from ORC stream .lowrange
of type LONG [hdfs://.....filename.orc.orc]
I have set below property in presto before starting the query:
set hive1.orc.use-column-names=true
where hive1 is my catalog name.
I have also tried to change Hive tables datatype for this column as Double/BigInt,Int But Nothing Worked.
Can someone help me to resolve the error.
Table Description:
+-------------------------------+---------------------------------------------------------+-----------------------+--+
| col_name | data_type | comment |
+-------------------------------+---------------------------------------------------------+-----------------------+--+
| # col_name | data_type | comment |
| | NULL | NULL |
| lowrange | string | |
| type | string | |
| processed_date | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| type | string | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | test | NULL |
| Owner: | hdfs | NULL |
| CreateTime: | Tue Mar 22 08:28:49 UTC 2022 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://......../user/hdfs/test/ | NULL |
| Table Type: | EXTERNAL_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | EXTERNAL | TRUE |
| | skip.header.line.count | 1 |
| | transient_lastDdlTime | 1647937729 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL |
| InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | field.delim | , |
| | serialization.format | , |
+-------------------------------+---------------------------------------------------------+-----------------------+--+
Sample Data:
lowrange type processed_date
1234567890001212 01 20220323
1234567890001213 01 20220323
Table Create Statement:
CREATE EXTERNAL TABLE `table1`(
`lowrange` string,
`processed_date` string)
PARTITIONED BY (
`type` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://......./user/hdfs/test'
TBLPROPERTIES (
'skip.header.line.count'='1')
Update: I dropped the existing Table and created new table with datatype from String to BigINt in hive table and able to select data from Table but When I am trying to perform lpad operation its again showing same issue.
Logic I want to apply on Field : lpad(lowrange ,13,'9')
Error: Unexpected parameters (bigint, integer, varchar(1)) for
function lpad. Expected: lpad(varchar(x), bigint, varchar(y))
then I tried to cast bigint to varchar using Below query:
Updated Logic : lpad(cast(lowrange as varchar),13,'9'))
Error:
Malformed ORC file. Cannot read SQL type bigint
from ORC stream .lowrange of type STRING
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论