更新:如何从 dask 数据帧转换/解析 str 日期
更新:
我能够执行转换。下一步是将其放回 ddf。
按照书中的建议,我所做的是:
- 解析日期并将其存储为单独的变量。
- 使用附加删除原始日期列
ddf2=ddf.drop('date',axis=1)
- 使用分配新解析的日期
ddf3=ddf2.assign(date=parsed_date)
将新日期添加为新列,最后一列。
问题1:是否有更有效的方法将parsed_date插入回ddf?
问题 2:如果我有三列字符串日期(日期、开始日期、结束日期),我无法确定循环是否有效,这样我就不必重新编码每个字符串日期。 (或者我的想法可能是错误的)
问题 3 对于 11OCT2020:13:03:12.452 格式的日期,这是正确的解析吗:“%d%b%Y:% H:%M:%S" ?我觉得我错过了一些秒数,因为上面的秒数是十进制数/浮点数。
较旧:
我在 dask 数据框中有以下列:
ddf = dd.DataFrame({'date': ['15JAN1955', '25DEC1990', '06MAY1962', '20SEPT1975']})
当它最初作为 dask 数据框上传时,它被投影为对象/字符串。在《Data Science with Python and Dask》一书中寻找指导时,它建议在初始上传时将其上传为 np.str 数据类型。但是,我无法理解如何将列转换为日期数据类型。我尝试使用 dd.to_datetime 处理它,确认返回 dtype: datetime64[ns] 但当我运行 ddf.dtypes 时,框架仍然返回对象数据类型。
我想将对象数据类型更改为日期以稍后过滤/运行条件
Update:
I was able to perform the conversion. The next step is to put it back to the ddf.
What I did, following the book suggestion are:
- the dates were parsed and stored as a separate variable.
- dropped the original date column using
ddf2=ddf.drop('date',axis=1)
- appended the new parsed date using assign
ddf3=ddf2.assign(date=parsed_date)
the new date was added as a new column, last column.
Question 1: is there a more efficient way to insert the parsed_date back to the ddf?
Question 2: What if I have three columns of string dates (date, startdate, enddate), I am not able to find if loop will work so that I did not have to recode each string dates. (or I could be wrong in the approach I am thinking)
Question 3 for the date in 11OCT2020:13:03:12.452 format, is this the right parsing: "%d%b%Y:%H:%M:%S" ? I feel I am missing something for the seconds because the seconds above is a decimal number/float.
Older:
I have the following column in a dask dataframe:
ddf = dd.DataFrame({'date': ['15JAN1955', '25DEC1990', '06MAY1962', '20SEPT1975']})
when it was initially uploaded as a dask dataframe, it was projected as an object/string. While looking for guidance in the Data Science with Python and Dask book, it suggested that at the initial upload to upload it as np.str datatype. However, I could not understand how to convert the column into a date datatype. I tried processing it using dd.to_datetime, the confirmation returned dtype: datetime64[ns] but when I ran the ddf.dtypes, the frame still returned an object datatype.
I would like to change the object dtype to date to filter/run a condition later on
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
dask.dataframe
支持pandas
API 来处理日期时间,所以这应该可以工作:dask.dataframe
supportspandas
API for handling datetimes, so this should work:通常,当我在计算或解析时遇到困难时,我会使用 applyamba 调用。尽管有人说这不是更好的方法,但它确实有效。尝试一下
Usually when I am having a hard time computing or parsing, I use the apply lamba call. Although some says it is not a better way but it works. Give it a try