更新：如何从 dask 数据帧转换/解析 str 日期

发布于 2025-01-09 13:19:17 字数 972 浏览 3 评论 0原文

更新：

我能够执行转换。下一步是将其放回 ddf。

按照书中的建议，我所做的是：

解析日期并将其存储为单独的变量。
使用附加删除原始日期列

ddf2=ddf.drop('date',axis=1)

使用分配新解析的日期

ddf3=ddf2.assign(date=parsed_date)

将新日期添加为新列，最后一列。

问题1：是否有更有效的方法将parsed_date插入回ddf？

问题 2：如果我有三列字符串日期（日期、开始日期、结束日期），我无法确定循环是否有效，这样我就不必重新编码每个字符串日期。（或者我的想法可能是错误的）

问题 3 对于 11OCT2020:13:03:12.452 格式的日期，这是正确的解析吗：“%d%b%Y:% H:%M:%S" ?我觉得我错过了一些秒数，因为上面的秒数是十进制数/浮点数。

较旧：

我在 dask 数据框中有以下列：

ddf = dd.DataFrame({'date': ['15JAN1955', '25DEC1990', '06MAY1962', '20SEPT1975']})

当它最初作为 dask 数据框上传时，它被投影为对象/字符串。在《Data Science with Python and Dask》一书中寻找指导时，它建议在初始上传时将其上传为 np.str 数据类型。但是，我无法理解如何将列转换为日期数据类型。我尝试使用 dd.to_datetime 处理它，确认返回 dtype: datetime64[ns] 但当我运行 ddf.dtypes 时，框架仍然返回对象数据类型。

我想将对象数据类型更改为日期以稍后过滤/运行条件

原文

Update:

I was able to perform the conversion. The next step is to put it back to the ddf.

What I did, following the book suggestion are:

the dates were parsed and stored as a separate variable.
dropped the original date column using

ddf2=ddf.drop('date',axis=1)

appended the new parsed date using assign

ddf3=ddf2.assign(date=parsed_date)

the new date was added as a new column, last column.

Question 1: is there a more efficient way to insert the parsed_date back to the ddf?

Question 2: What if I have three columns of string dates (date, startdate, enddate), I am not able to find if loop will work so that I did not have to recode each string dates. (or I could be wrong in the approach I am thinking)

Question 3 for the date in 11OCT2020:13:03:12.452 format, is this the right parsing: "%d%b%Y:%H:%M:%S" ? I feel I am missing something for the seconds because the seconds above is a decimal number/float.

Older:

I have the following column in a dask dataframe:

ddf = dd.DataFrame({'date': ['15JAN1955', '25DEC1990', '06MAY1962', '20SEPT1975']})

when it was initially uploaded as a dask dataframe, it was projected as an object/string. While looking for guidance in the Data Science with Python and Dask book, it suggested that at the initial upload to upload it as np.str datatype. However, I could not understand how to convert the column into a date datatype. I tried processing it using dd.to_datetime, the confirmation returned dtype: datetime64[ns] but when I ran the ddf.dtypes, the frame still returned an object datatype.

I would like to change the object dtype to date to filter/run a condition later on

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你另情深 2025-01-16 13:19:17

dask.dataframe 支持 pandas API 来处理日期时间，所以这应该可以工作：

import dask.dataframe as dd
import pandas as pd

df = pd.DataFrame({"date": ["15JAN1955", "25DEC1990", "06MAY1962", "20SEPT1975"]})

print(pd.to_datetime(df["date"]))
# 0   1955-01-15
# 1   1990-12-25
# 2   1962-05-06
# 3   1975-09-20
# Name: date, dtype: datetime64[ns]

ddf = dd.from_pandas(df, npartitions=2)

ddf["date"] = dd.to_datetime(ddf["date"])

print(ddf.compute())
#         date
# 0 1955-01-15
# 1 1990-12-25
# 2 1962-05-06
# 3 1975-09-20

dask.dataframe supports pandas API for handling datetimes, so this should work:

import dask.dataframe as dd
import pandas as pd

df = pd.DataFrame({"date": ["15JAN1955", "25DEC1990", "06MAY1962", "20SEPT1975"]})

print(pd.to_datetime(df["date"]))
# 0   1955-01-15
# 1   1990-12-25
# 2   1962-05-06
# 3   1975-09-20
# Name: date, dtype: datetime64[ns]

ddf = dd.from_pandas(df, npartitions=2)

ddf["date"] = dd.to_datetime(ddf["date"])

print(ddf.compute())
#         date
# 0 1955-01-15
# 1 1990-12-25
# 2 1962-05-06
# 3 1975-09-20

回复收藏 0 原文