我可以通过数据类型错误将DF转换为parquet
我正在尝试将PANDAS DataFrame转换为Parquet,但是我遇到了一个错误“删除字节,获得了'int'对象”,'conversion'conversion thing xxxxxxxx带有类型对象') Excel中的该表具有数字和字符串,它就像dtype的“对象”,即使如此出现错误。我已经尝试过df ['xxxxxxxx']。astype(str),df ['xxxxxxxx']。astype('data_type'),但它们都没有起作用。 我尝试用AWS Wrangler和Pyarrow转换为Parquet
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您尝试过:
Did you try :
我在使用AWS Wrangler将Pandas DataFrame保存到Paraquet的同时,我遇到了这个错误。
在我的情况下,这发生在列的前几行是
dateTime
类型的时,而下面的剩余行则是刺痛类型。我用它检查了其中包含不同数据类型的列。然后将确定的列的所有行转换为一个单个数据类型。
I got this error while saving my pandas dataframe to paraquet using aws wrangler.
This happened in my case when first few rows of a column were of
datetime
type, and remaining rows below were of sting type. I used this to check for columns that have different datatypes within them.Then convert the all the rows of identified columns to one single datatype.
我今天面对同一问题,并使用地图解决:
链接:
I facing with same issue today and used map to resolve:
Link : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.applymap.html
如其他
问题。因此,请尝试:
但是,这不是一个好习惯,因为这将隐藏类型错误,您应该考虑通过分开数据或意识到此columnhas不同类型来修复列的类型。熊猫在此类型的错误中包含警告:
As mentioned in this other question
A general type of the column could work. So try:
However, this is not a good practice as this will hide the type error, you should consider fixing the type of the column by separating data or be aware that this columnhas different types. Pandas has a warning included for these type of errors:
我遇到了同样的问题。设置
引擎='fastparquet'
to_parquet
方法的参数对我有帮助。I had the same problem. Setting
engine='fastparquet'
argument for theto_parquet
method helped me.