我可以为一个新文件获取hive_bad_data，但我可以找到与以前的文件的区别

发布于 2025-02-09 13:20:29 字数 635 浏览 2 评论 0原文

我将熊猫的数据帧文件保存为S3中的镶木件文件。添加了最后一个文件后，它引起了此错误：

hive_bad_data：parquet file s3：// **/resuct_2022-06-16.gzip与表中的型号双重定义的类型不兼容模式*

，但我看不到新文件的“历史记录”字段与先前的“历史记录”字段之间的区别。

df2.history.dtypes
Out[98]: dtype('float64')

df0.history.dtypes
Out[99]: dtype('float64')

我如何找到解决方案的区别？

顺便说一句，当我将雅典娜表中的历史记录字段从double更改为int时，它对以前的数据不起作用，而不是int64要求double，但可以使用新数据！

因此，意味着一个文件需要双倍，另一个文件需要INT64，但是当我通过Python读取数据时，我看不到任何区别

原文

I'm saving my panda's dataframe files as a parquet file in an S3. After the last file was added, it raise this error:

HIVE_BAD_DATA: Field history's type INT64 in parquet file s3://**/result_2022-06-16.gzip is incompatible with type double defined in table schema*

but I can't see any difference between the 'history' field of the new file and the previous ones.

df2.history.dtypes
Out[98]: dtype('float64')

df0.history.dtypes
Out[99]: dtype('float64')

How I can find the difference to fix it?

By the way, when I change the history field in the Athena table from double to int, then it doesn't work for the previous data, and instead of INT64 ask for DOUBLE but works fine with the new data!!

so means one file needs DOUBLE and another needs INT64 but when I read data by python I can't see any difference

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

关于作者

孤寂小茶

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

我可以为一个新文件获取hive_bad_data，但我可以找到与以前的文件的区别

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

我可以为一个新文件获取hive_bad_data，但我可以找到与以前的文件的区别

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。