如何在加载镶木quet文件时单独添加标头行?
在处理CSV文件时,我们可以说:
df = pd.read_csv("test.csv", names=header_list, dtype=dtype_dict)
以上将在dtype_dict中以header_list和dtypes创建一个数据框,
我们可以使用pd._read_parquet()
做类似的事情吗?
我的问题涉及单独传递标题,因此在“ test.csv”中不可用
绕过的另一种方法可能是将DF中的整个数据向下移动1(包括将标题转换为行),然后用header_list替换标题(甚至可能吗?)
是否有最佳解决方案? 我对镶木木不太熟悉,因此任何指导都将不胜感激,谢谢。
While handling csv files we can say:
df = pd.read_csv("test.csv", names=header_list, dtype=dtype_dict)
Above would create a dataframe with headers as header_list and dtypes as of the dtype_dict
Can we do something similar with pd.read_parquet()
?
My issue involves passing in headers separately and would thus not be available in the "test.csv"
Another way to bypass it could be to move the entire data in df downwards by 1 (including shifting headers into rows) and then replacing the header with header_list (if it's even possible?)
Is there an optimal solution to my issue?
I'm not too familiar with parquet so any guidance would be appreciated, thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
镶木文件包含一些元数据,包括列的名称及其类型。因此,加载数据时无需传递此信息。
parquet files contain some metadata, including the name of the columns and their types. So there is no need to pass this information when loading the data.