将 Pandas 列转换为日期时间

发布于 2025-01-13 20:10:33 字数 223 浏览 0 评论 0原文

我在 pandas DataFrame 中有一个字段以字符串格式导入。

它应该是一个日期时间变量。如何将其转换为日期时间列,然后根据日期进行过滤?

例子:

raw_data = pd.DataFrame({'Mycol': ['05SEP2014:00:00:00.000']})

I have one field in a pandas DataFrame that was imported as string format.

It should be a datetime variable. How do I convert it to a datetime column, and then filter based on date?

Example:

raw_data = pd.DataFrame({'Mycol': ['05SEP2014:00:00:00.000']})

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

温柔少女心 2025-01-20 20:10:33

使用 to_datetime 函数,指定格式来匹配您的数据。

df['Mycol'] = pd.to_datetime(df['Mycol'], format='%d%b%Y:%H:%M:%S.%f')

Use the to_datetime function, specifying a format to match your data.

df['Mycol'] = pd.to_datetime(df['Mycol'], format='%d%b%Y:%H:%M:%S.%f')
梦里人 2025-01-20 20:10:33

如果您有多于一列需要转换,您可以执行以下操作:

df[["col1", "col2", "col3"]] = df[["col1", "col2", "col3"]].apply(pd.to_datetime)

If you have more than one column to be converted you can do the following:

df[["col1", "col2", "col3"]] = df[["col1", "col2", "col3"]].apply(pd.to_datetime)
哀由 2025-01-20 20:10:33

编辑:建议使用 pd.to_datetime() 而不是这个,因为 .apply() 通常速度较慢。

您可以使用 DataFrame 方法 .apply( ) 对 Mycol 中的值进行操作:

>>> df = pd.DataFrame(['05SEP2014:00:00:00.000'], columns=['Mycol'])
>>> df
                    Mycol
0  05SEP2014:00:00:00.000
>>> import datetime as dt
>>> df['Mycol'] = df['Mycol'].apply(lambda x: 
...     dt.datetime.strptime(x, '%d%b%Y:%H:%M:%S.%f'))
>>> df
       Mycol
0 2014-09-05

edit: recommending to use pd.to_datetime() instead of this because .apply() is generally slower.

You can use the DataFrame method .apply() to operate on the values in Mycol:

>>> df = pd.DataFrame(['05SEP2014:00:00:00.000'], columns=['Mycol'])
>>> df
                    Mycol
0  05SEP2014:00:00:00.000
>>> import datetime as dt
>>> df['Mycol'] = df['Mycol'].apply(lambda x: 
...     dt.datetime.strptime(x, '%d%b%Y:%H:%M:%S.%f'))
>>> df
       Mycol
0 2014-09-05
本王不退位尔等都是臣 2025-01-20 20:10:33

使用 pandas to_datetime 函数将该列解析为 DateTime。此外,通过使用 infer_datetime_format=True,它将自动检测格式并将提到的列转换为 DateTime。

import pandas as pd
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], infer_datetime_format=True)

Use the pandas to_datetime function to parse the column as DateTime. Also, by using infer_datetime_format=True, it will automatically detect the format and convert the mentioned column to DateTime.

import pandas as pd
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], infer_datetime_format=True)
身边 2025-01-20 20:10:33

节省时间:

raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'])

Time Saver:

raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'])
情深如许 2025-01-20 20:10:33
设置正确的 format= 比让 pandas 找出来要快得多1

长话短说,从一开始就传递正确的 format=,如 < a href="https://stackoverflow.com/a/26763793/19123103">chrisb 的帖子 比让 pandas 弄清楚格式要快得多,特别是当格式包含 时间 组件时。大于 10k 行的数据帧的运行时差异是巨大的(大约快 25 倍,所以我们说的是几分钟而不是几秒钟)。所有有效的格式选项均可在 https://strftime.org/ 中找到。

perfplot

errors='coerce' 很有用

如果某些行的格式不正确或根本不是日期时间,errors= 参数非常有用,这样您就可以可以转换有效行和句柄稍后包含无效值的行。

df['date'] = pd.to_datetime(
    df['date'], format='%d%b%Y:%H:%M:%S.%f', errors='coerce')

# for multiple columns
df[['start', 'end']] = df[['start', 'end']].apply(
    pd.to_datetime, format='%d%b%Y:%H:%M:%S.%f', errors='coerce')
静音 SettingWithCopyWarning

顺便说一句,如果您收到此警告,则意味着您的数据框可能是通过过滤另一个数据框创建的。启用写时复制就可以了。 (有关更多信息,请参阅这篇文章)。

pd.set_option('copy_on_write', True)
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f')

1 用于生成 timeit 测试图的代码。

import perfplot
from random import choices
from datetime import datetime

mdYHMSf = range(1,13), range(1,29), range(2000,2024), range(24), *[range(60)]*2, range(1000)
perfplot.show(
    kernels=[lambda x: pd.to_datetime(x), 
             lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M:%S.%f'), 
             lambda x: pd.to_datetime(x, infer_datetime_format=True),
             lambda s: s.apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))],
    labels=["pd.to_datetime(df['date'])", 
            "pd.to_datetime(df['date'], format='%m/%d/%Y %H:%M:%S.%f')", 
            "pd.to_datetime(df['date'], infer_datetime_format=True)", 
            "df['date'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))"],
    n_range=[2**k for k in range(20)],
    setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}:{S}.{f}" 
                               for m,d,Y,H,M,S,f in zip(*[choices(e, k=n) for e in mdYHMSf])]),
    equality_check=pd.Series.equals,
    xlabel='len(df)'
)

如果该列包含多种格式,请参阅将混合格式字符串列转换为日期时间 Dtype

Setting the correct format= is much faster than letting pandas find out1

Long story short, passing the correct format= from the beginning as in chrisb's post is much faster than letting pandas figure out the format, especially if the format contains time component. The runtime difference for dataframes greater than 10k rows is huge (~25 times faster, so we're talking like a couple minutes vs a few seconds). All valid format options can be found at https://strftime.org/.

perfplot

errors='coerce' is useful

If some rows are not in the correct format or not datetime at all, errors= parameter is very useful, so that you can convert the valid rows and handle the rows that contained invalid values later.

df['date'] = pd.to_datetime(
    df['date'], format='%d%b%Y:%H:%M:%S.%f', errors='coerce')

# for multiple columns
df[['start', 'end']] = df[['start', 'end']].apply(
    pd.to_datetime, format='%d%b%Y:%H:%M:%S.%f', errors='coerce')
To silence SettingWithCopyWarning

On a side note, if you got this warning, then that means your dataframe was probably created by filtering another dataframe. Enable copy-on-write and you're good to go. (see this post for more about it).

pd.set_option('copy_on_write', True)
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f')

1 Code used to produce the timeit test plot.

import perfplot
from random import choices
from datetime import datetime

mdYHMSf = range(1,13), range(1,29), range(2000,2024), range(24), *[range(60)]*2, range(1000)
perfplot.show(
    kernels=[lambda x: pd.to_datetime(x), 
             lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M:%S.%f'), 
             lambda x: pd.to_datetime(x, infer_datetime_format=True),
             lambda s: s.apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))],
    labels=["pd.to_datetime(df['date'])", 
            "pd.to_datetime(df['date'], format='%m/%d/%Y %H:%M:%S.%f')", 
            "pd.to_datetime(df['date'], infer_datetime_format=True)", 
            "df['date'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))"],
    n_range=[2**k for k in range(20)],
    setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}:{S}.{f}" 
                               for m,d,Y,H,M,S,f in zip(*[choices(e, k=n) for e in mdYHMSf])]),
    equality_check=pd.Series.equals,
    xlabel='len(df)'
)

If the column contains multiple formats, see Convert a column of mixed format strings to a datetime Dtype.

执着的年纪 2025-01-20 20:10:33

就像我们将对象数据类型转换为 float 或 int 一样,使用 astype ()

raw_data['Mycol'] = raw_data['Mycol'].astype('datetime64[ns]')

Just like we convert object data type to float or int, use astype().

raw_data['Mycol'] = raw_data['Mycol'].astype('datetime64[ns]')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文