Hive ORC 表中的 Presto 数据类型不匹配问题

发布于 01-16 04:08 字数 6816 浏览 2 评论 0原文

我正在尝试通过 presto 查询我的 hive orc 表,在 Hive 中其工作正常。在 prestro 中,我能够访问除 lowrange 之外的所有列,它显示以下错误,

error : Query 20220322_135856_00076_a33ec failed: Error opening Hive split hdfs://.....filename.orc 
(offset=0, length=24216): Malformed ORC file. Cannot read SQL type varchar from ORC stream .lowrange 
 of type LONG [hdfs://.....filename.orc.orc]

在开始查询之前我已在 presto 中设置了以下属性:

set hive1.orc.use-column-names=true

其中 hive1 是我的目录名称。

我还尝试将此列的 Hive 表数据类型更改为 Double/BigInt,Int 但没有任何效果。

有人可以帮我解决该错误吗?

表描述:

+-------------------------------+---------------------------------------------------------+-----------------------+--+
|           col_name            |                        data_type                        |        comment        |
+-------------------------------+---------------------------------------------------------+-----------------------+--+
| # col_name                    | data_type                                               | comment               |
|                               | NULL                                                    | NULL                  |
| lowrange                      | string                                                  |                       |
| type                          | string                                                  |                       |
| processed_date                | string                                                  |                       |
|                               | NULL                                                    | NULL                  |
| # Partition Information       | NULL                                                    | NULL                  |
| # col_name                    | data_type                                               | comment               |
|                               | NULL                                                    | NULL                  |
| type                          | string                                                  |                       |
|                               | NULL                                                    | NULL                  |
| # Detailed Table Information  | NULL                                                    | NULL                  |
| Database:                     | test                                                    | NULL                  |
| Owner:                        | hdfs                                                    | NULL                  |
| CreateTime:                   | Tue Mar 22 08:28:49 UTC 2022                            | NULL                  |
| LastAccessTime:               | UNKNOWN                                                 | NULL                  |
| Protect Mode:                 | None                                                    | NULL                  |
| Retention:                    | 0                                                       | NULL                  |
| Location:                     | hdfs://......../user/hdfs/test/  | NULL                  |
| Table Type:                   | EXTERNAL_TABLE                                          | NULL                  |
| Table Parameters:             | NULL                                                    | NULL                  |
|                               | EXTERNAL                                                | TRUE                  |
|                               | skip.header.line.count                                  | 1                     |
|                               | transient_lastDdlTime                                   | 1647937729            |
|                               | NULL                                                    | NULL                  |
| # Storage Information         | NULL                                                    | NULL                  |
| SerDe Library:                | org.apache.hadoop.hive.ql.io.orc.OrcSerde               | NULL                  |
| InputFormat:                  | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat         | NULL                  |
| OutputFormat:                 | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat        | NULL                  |
| Compressed:                   | No                                                      | NULL                  |
| Num Buckets:                  | -1                                                      | NULL                  |
| Bucket Columns:               | []                                                      | NULL                  |
| Sort Columns:                 | []                                                      | NULL                  |
| Storage Desc Params:          | NULL                                                    | NULL                  |
|                               | field.delim                                             | ,                     |
|                               | serialization.format                                    | ,                     |
+-------------------------------+---------------------------------------------------------+-----------------------+--+

Sample Data:

lowrange              type    processed_date

1234567890001212      01        20220323
1234567890001213      01        20220323

表创建语句:

CREATE EXTERNAL TABLE `table1`(                       
`lowrange` string,                                                                                
`processed_date` string)                                  
PARTITIONED BY (                                            
`type` string)                                     
ROW FORMAT DELIMITED                                        
FIELDS TERMINATED BY ','                                  
STORED AS INPUTFORMAT                                       
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'         
OUTPUTFORMAT                                                
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'        
LOCATION                                                    
'hdfs://......./user/hdfs/test'  
TBLPROPERTIES (                                             
   'skip.header.line.count'='1')   

更新:我删除了现有表并在 hive 表中创建了数据类型从 String 到 BigINt 的新表,并且能够从表中选择数据,但是当我尝试执行 lpad 操作时,它再次显示相同的问题。

Logic I want to apply on Field : lpad(lowrange ,13,'9')

Error: Unexpected parameters (bigint, integer, varchar(1)) for 
function lpad. Expected: lpad(varchar(x), bigint, varchar(y))

然后我尝试使用以下查询将 bigint 转换为 varchar:

Updated Logic : lpad(cast(lowrange as varchar),13,'9'))

Error: 
Malformed ORC file. Cannot read SQL type bigint 
from ORC stream .lowrange of type STRING

I am trying to query my hive orc table by presto ,In Hive its working Fine.In prestro I am able to access all the column except lowrange It's showing Below Erroe

error : Query 20220322_135856_00076_a33ec failed: Error opening Hive split hdfs://.....filename.orc 
(offset=0, length=24216): Malformed ORC file. Cannot read SQL type varchar from ORC stream .lowrange 
 of type LONG [hdfs://.....filename.orc.orc]

I have set below property in presto before starting the query:

set hive1.orc.use-column-names=true

where hive1 is my catalog name.

I have also tried to change Hive tables datatype for this column as Double/BigInt,Int But Nothing Worked.

Can someone help me to resolve the error.

Table Description:

+-------------------------------+---------------------------------------------------------+-----------------------+--+
|           col_name            |                        data_type                        |        comment        |
+-------------------------------+---------------------------------------------------------+-----------------------+--+
| # col_name                    | data_type                                               | comment               |
|                               | NULL                                                    | NULL                  |
| lowrange                      | string                                                  |                       |
| type                          | string                                                  |                       |
| processed_date                | string                                                  |                       |
|                               | NULL                                                    | NULL                  |
| # Partition Information       | NULL                                                    | NULL                  |
| # col_name                    | data_type                                               | comment               |
|                               | NULL                                                    | NULL                  |
| type                          | string                                                  |                       |
|                               | NULL                                                    | NULL                  |
| # Detailed Table Information  | NULL                                                    | NULL                  |
| Database:                     | test                                                    | NULL                  |
| Owner:                        | hdfs                                                    | NULL                  |
| CreateTime:                   | Tue Mar 22 08:28:49 UTC 2022                            | NULL                  |
| LastAccessTime:               | UNKNOWN                                                 | NULL                  |
| Protect Mode:                 | None                                                    | NULL                  |
| Retention:                    | 0                                                       | NULL                  |
| Location:                     | hdfs://......../user/hdfs/test/  | NULL                  |
| Table Type:                   | EXTERNAL_TABLE                                          | NULL                  |
| Table Parameters:             | NULL                                                    | NULL                  |
|                               | EXTERNAL                                                | TRUE                  |
|                               | skip.header.line.count                                  | 1                     |
|                               | transient_lastDdlTime                                   | 1647937729            |
|                               | NULL                                                    | NULL                  |
| # Storage Information         | NULL                                                    | NULL                  |
| SerDe Library:                | org.apache.hadoop.hive.ql.io.orc.OrcSerde               | NULL                  |
| InputFormat:                  | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat         | NULL                  |
| OutputFormat:                 | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat        | NULL                  |
| Compressed:                   | No                                                      | NULL                  |
| Num Buckets:                  | -1                                                      | NULL                  |
| Bucket Columns:               | []                                                      | NULL                  |
| Sort Columns:                 | []                                                      | NULL                  |
| Storage Desc Params:          | NULL                                                    | NULL                  |
|                               | field.delim                                             | ,                     |
|                               | serialization.format                                    | ,                     |
+-------------------------------+---------------------------------------------------------+-----------------------+--+

Sample Data:

lowrange              type    processed_date

1234567890001212      01        20220323
1234567890001213      01        20220323

Table Create Statement:

CREATE EXTERNAL TABLE `table1`(                       
`lowrange` string,                                                                                
`processed_date` string)                                  
PARTITIONED BY (                                            
`type` string)                                     
ROW FORMAT DELIMITED                                        
FIELDS TERMINATED BY ','                                  
STORED AS INPUTFORMAT                                       
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'         
OUTPUTFORMAT                                                
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'        
LOCATION                                                    
'hdfs://......./user/hdfs/test'  
TBLPROPERTIES (                                             
   'skip.header.line.count'='1')   

Update: I dropped the existing Table and created new table with datatype from String to BigINt in hive table and able to select data from Table but When I am trying to perform lpad operation its again showing same issue.

Logic I want to apply on Field : lpad(lowrange ,13,'9')

Error: Unexpected parameters (bigint, integer, varchar(1)) for 
function lpad. Expected: lpad(varchar(x), bigint, varchar(y))

then I tried to cast bigint to varchar using Below query:

Updated Logic : lpad(cast(lowrange as varchar),13,'9'))

Error: 
Malformed ORC file. Cannot read SQL type bigint 
from ORC stream .lowrange of type STRING

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文