如何使用可以在AWS Athena中阅读的熊猫来编写镶木quet文件?
我有一个简单的数据框架,我想转换为镶木quet文件:
out_buffer = BytesIO()
input_datafame.to_parquet(out_buffer, index=False, compression="gzip")
当我这样做时,生成的镶木quet文件具有以下内容:
file schema: schema
--------------------------------------------------------------------------------
somID: OPTIONAL INT64 R:0 D:1
SessionID: OPTIONAL INT64 R:0 D:1
JobID: OPTIONAL INT64 R:0 D:1
JobCreationTime: OPTIONAL BINARY L:STRING R:0 D:1
ProcessedId: OPTIONAL BINARY L:STRING R:0 D:1
S3Results: OPTIONAL BINARY L:STRING R:0 D:1
表创建:
`someid` bigint COMMENT '',
`sessionid` bigint COMMENT '',
`jobid` bigint COMMENT '',
`jobcreationtime` string COMMENT '',
`processedid` string COMMENT '',
`s3results` string COMMENT ''
查询数据结果:
HIVE_METASTORE_ERROR: com.amazonaws.services.datacatalog.model.InvalidInputException: Error:
type expected at the position 0 of 'integer' but 'integer' is found.
(Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null)
我怀疑两个问题引起了这一点:
- 兼容整数类型
- 不 ?
我不知道这是一个众所周知的问题吗 我找不到有关此的任何细节。
I have a simple dataframe that I would like to convert to a Parquet file:
out_buffer = BytesIO()
input_datafame.to_parquet(out_buffer, index=False, compression="gzip")
When I do this the resulting Parquet file has the following:
file schema: schema
--------------------------------------------------------------------------------
somID: OPTIONAL INT64 R:0 D:1
SessionID: OPTIONAL INT64 R:0 D:1
JobID: OPTIONAL INT64 R:0 D:1
JobCreationTime: OPTIONAL BINARY L:STRING R:0 D:1
ProcessedId: OPTIONAL BINARY L:STRING R:0 D:1
S3Results: OPTIONAL BINARY L:STRING R:0 D:1
Table create:
`someid` bigint COMMENT '',
`sessionid` bigint COMMENT '',
`jobid` bigint COMMENT '',
`jobcreationtime` string COMMENT '',
`processedid` string COMMENT '',
`s3results` string COMMENT ''
Querying the data results in the following:
HIVE_METASTORE_ERROR: com.amazonaws.services.datacatalog.model.InvalidInputException: Error:
type expected at the position 0 of 'integer' but 'integer' is found.
(Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null)
I suspect two issues causing this:
- incompatible integer types
- optional types
Is this a well known issue that I am not aware of? I could not find any details about this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在雅典娜的整数和INT之间似乎存在差异。 int工作,整数没有。不知道为什么。可选字段可以使用镶木quet文件。
It seems that there is a difference between integer and int in Athena. Int works, integer does not. Not sure why. The Parquet file is fine with optional fields.