Python Polars:将列读取为日期时间
如何将 csv 读入 Polars DataFrame 并将其中一列解析为日期时间?
或者,如何将列转换为 pl.Datetime
?
How does one read a csv into a polars DataFrame and parse one of the columns as a datetime?
Alternatively, how does one convert a column to a pl.Datetime
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我首先会在
read_csv
调用中尝试try_parse_dates=True
。例如,假设我们有以下数据:
start
列解析为日期,last_updt
列解析为日期时间。但请注意,end
列未解析为日期,因为它不是 ISO 8601 格式。 (我遇到过很多 csv 文件,其中日期/日期时间字段是非标准的。)要解析此列,我们可以使用
.str.to_date()
函数并提供适当的格式。I would first try
try_parse_dates=True
in theread_csv
call.For example, let's say we have the following data:
The
start
column parsed as a Date, and thelast_updt
column parsed as a Datetime. But notice that theend
column did not parse as a date because it is not in ISO 8601 format. (I've come across plenty of csv files where Date/Datetime fields were non-standard.)To parse this column, we can use the
.str.to_date()
function and supply the appropriate format.Polars 支持两种 csv 读取器,一种是内置的,一种是基于
pyarrow
的。 pyarrow reader支持直接解析日期;另请参阅 https://github.com/pola-rs/polars/issues/1330< /a>.您可以在read_csv
中设置use_pyarrow=True
,但根据文档,只有在将其他参数输入到read_csv
时才会使用它。或者,读取为 Utf8(字符串),并使用
strptime
解析为字符串:https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.internals.series.StringNameSpace.strptime.html?highlight=strptime#polars.internals.series.StringNameSpace .strptime。我发现这种方法通常更容易,但根据数据的大小,可能会相对昂贵,因为您首先需要存储为 Utf8,然后进行解析。Polars supports two csv readers, one built-in and one based on
pyarrow
. The pyarrow reader supports parsing dates directly; see also https://github.com/pola-rs/polars/issues/1330. You can setuse_pyarrow=True
inread_csv
, but as per the documentation, it will only be used given also the other parameter inputs intoread_csv
.Alternatively, read as Utf8 (string), and parse to string with
strptime
: https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.internals.series.StringNameSpace.strptime.html?highlight=strptime#polars.internals.series.StringNameSpace.strptime. This is the method I find easier typically, but may, depending on the size of your data, be relatively expensive as you first need to store as Utf8 and then parse.