pyspark读取镶木木材文件错误 - 非法镶木类型:INT64(TIMESTAMP(NANOS,true))
我使用Pyspark函数读取Parquet文件sqlcontext.read.parquet
。这些文件包含时间戳列。由于Pyspark在编写纳秒时间戳方面存在问题,因此我使用Pandas编写数据。带有纳米秒的熊猫时间戳列的输出格式看起来像:2022-05-23 08:08:35.106226000
带有版本3.1.x我能够无问题读取数据并自动转换,并自动转换时间戳分为长期类型,但在3.2.x下发生以下错误:非法木木式类型:int64(timestamp(nanos,true))
。
我想防止使用手动定义的模式阅读所有数据,因为这对我来说是巨大的努力。
是否有读取选项或其他解决方案来解决此问题?
I read parquet files using the pyspark function sqlContext.read.parquet
. The files contain timestamp columns. Due to the fact that pyspark has its issues with writing nanosecond timestamps, I use pandas to write the data. The output format for the pandas timestamp columns with nanoseconds look like this: 2022-05-23 08:08:35.106226000
With version 3.1.x I was to able read the data without any problems and spark automatically converts the timestamps into a long type, but under 3.2.x the following error occurs: Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true))
.
I would like to prevent reading all of my data with a manually defined schema since it would be a huge effort for me.
Is there a read option or another solution to solve this problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论