将 Pandas 列转换为日期时间

发布于 2025-01-13 20:10:33 字数 223 浏览 0 评论 0原文

我在 pandas DataFrame 中有一个字段以字符串格式导入。

它应该是一个日期时间变量。如何将其转换为日期时间列，然后根据日期进行过滤？

例子：

raw_data = pd.DataFrame({'Mycol': ['05SEP2014:00:00:00.000']})

原文

I have one field in a pandas DataFrame that was imported as string format.

It should be a datetime variable. How do I convert it to a datetime column, and then filter based on date?

Example:

raw_data = pd.DataFrame({'Mycol': ['05SEP2014:00:00:00.000']})

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

温柔少女心 2025-01-20 20:10:33

使用 to_datetime 函数，指定格式来匹配您的数据。

df['Mycol'] = pd.to_datetime(df['Mycol'], format='%d%b%Y:%H:%M:%S.%f')

Use the to_datetime function, specifying a format to match your data.

df['Mycol'] = pd.to_datetime(df['Mycol'], format='%d%b%Y:%H:%M:%S.%f')

回复收藏 0 原文

梦里人 2025-01-20 20:10:33

如果您有多于一列需要转换，您可以执行以下操作：

df[["col1", "col2", "col3"]] = df[["col1", "col2", "col3"]].apply(pd.to_datetime)

If you have more than one column to be converted you can do the following:

df[["col1", "col2", "col3"]] = df[["col1", "col2", "col3"]].apply(pd.to_datetime)

回复收藏 0 原文

哀由 2025-01-20 20:10:33

编辑：建议使用 pd.to_datetime() 而不是这个，因为 .apply() 通常速度较慢。

您可以使用 DataFrame 方法 .apply( ) 对 Mycol 中的值进行操作：

>>> df = pd.DataFrame(['05SEP2014:00:00:00.000'], columns=['Mycol'])
>>> df
                    Mycol
0  05SEP2014:00:00:00.000
>>> import datetime as dt
>>> df['Mycol'] = df['Mycol'].apply(lambda x: 
...     dt.datetime.strptime(x, '%d%b%Y:%H:%M:%S.%f'))
>>> df
       Mycol
0 2014-09-05

edit: recommending to use pd.to_datetime() instead of this because .apply() is generally slower.

You can use the DataFrame method .apply() to operate on the values in Mycol:

>>> df = pd.DataFrame(['05SEP2014:00:00:00.000'], columns=['Mycol'])
>>> df
                    Mycol
0  05SEP2014:00:00:00.000
>>> import datetime as dt
>>> df['Mycol'] = df['Mycol'].apply(lambda x: 
...     dt.datetime.strptime(x, '%d%b%Y:%H:%M:%S.%f'))
>>> df
       Mycol
0 2014-09-05

回复收藏 0 原文

本王不退位尔等都是臣 2025-01-20 20:10:33

使用 pandas to_datetime 函数将该列解析为 DateTime。此外，通过使用 infer_datetime_format=True，它将自动检测格式并将提到的列转换为 DateTime。

import pandas as pd
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], infer_datetime_format=True)

Use the pandas to_datetime function to parse the column as DateTime. Also, by using infer_datetime_format=True, it will automatically detect the format and convert the mentioned column to DateTime.

import pandas as pd
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], infer_datetime_format=True)

回复收藏 0 原文

身边 2025-01-20 20:10:33

节省时间：

raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'])

Time Saver:

raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'])

回复收藏 0 原文

情深如许 2025-01-20 20:10:33

设置正确的 `format=` 比让 pandas 找出来要快得多¹

长话短说，从一开始就传递正确的 format=，如 < a href="https://stackoverflow.com/a/26763793/19123103">chrisb 的帖子比让 pandas 弄清楚格式要快得多，特别是当格式包含时间组件时。大于 10k 行的数据帧的运行时差异是巨大的（大约快 25 倍，所以我们说的是几分钟而不是几秒钟）。所有有效的格式选项均可在 https://strftime.org/ 中找到。

`errors='coerce'` 很有用

如果某些行的格式不正确或根本不是日期时间，errors= 参数非常有用，这样您就可以可以转换有效行和句柄稍后包含无效值的行。

df['date'] = pd.to_datetime(
    df['date'], format='%d%b%Y:%H:%M:%S.%f', errors='coerce')

# for multiple columns
df[['start', 'end']] = df[['start', 'end']].apply(
    pd.to_datetime, format='%d%b%Y:%H:%M:%S.%f', errors='coerce')

静音 `SettingWithCopyWarning`

顺便说一句，如果您收到此警告，则意味着您的数据框可能是通过过滤另一个数据框创建的。启用写时复制就可以了。（有关更多信息，请参阅这篇文章）。

pd.set_option('copy_on_write', True)
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f')

¹ 用于生成 timeit 测试图的代码。

import perfplot
from random import choices
from datetime import datetime

mdYHMSf = range(1,13), range(1,29), range(2000,2024), range(24), *[range(60)]*2, range(1000)
perfplot.show(
    kernels=[lambda x: pd.to_datetime(x), 
             lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M:%S.%f'), 
             lambda x: pd.to_datetime(x, infer_datetime_format=True),
             lambda s: s.apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))],
    labels=["pd.to_datetime(df['date'])", 
            "pd.to_datetime(df['date'], format='%m/%d/%Y %H:%M:%S.%f')", 
            "pd.to_datetime(df['date'], infer_datetime_format=True)", 
            "df['date'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))"],
    n_range=[2**k for k in range(20)],
    setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}:{S}.{f}" 
                               for m,d,Y,H,M,S,f in zip(*[choices(e, k=n) for e in mdYHMSf])]),
    equality_check=pd.Series.equals,
    xlabel='len(df)'
)

如果该列包含多种格式，请参阅将混合格式字符串列转换为日期时间 Dtype。

Setting the correct `format=` is much faster than letting pandas find out¹

Long story short, passing the correct format= from the beginning as in chrisb's post is much faster than letting pandas figure out the format, especially if the format contains time component. The runtime difference for dataframes greater than 10k rows is huge (~25 times faster, so we're talking like a couple minutes vs a few seconds). All valid format options can be found at https://strftime.org/.

`errors='coerce'` is useful

If some rows are not in the correct format or not datetime at all, errors= parameter is very useful, so that you can convert the valid rows and handle the rows that contained invalid values later.

df['date'] = pd.to_datetime(
    df['date'], format='%d%b%Y:%H:%M:%S.%f', errors='coerce')

# for multiple columns
df[['start', 'end']] = df[['start', 'end']].apply(
    pd.to_datetime, format='%d%b%Y:%H:%M:%S.%f', errors='coerce')

To silence `SettingWithCopyWarning`

On a side note, if you got this warning, then that means your dataframe was probably created by filtering another dataframe. Enable copy-on-write and you're good to go. (see this post for more about it).

pd.set_option('copy_on_write', True)
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f')

¹ Code used to produce the timeit test plot.

import perfplot
from random import choices
from datetime import datetime

mdYHMSf = range(1,13), range(1,29), range(2000,2024), range(24), *[range(60)]*2, range(1000)
perfplot.show(
    kernels=[lambda x: pd.to_datetime(x), 
             lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M:%S.%f'), 
             lambda x: pd.to_datetime(x, infer_datetime_format=True),
             lambda s: s.apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))],
    labels=["pd.to_datetime(df['date'])", 
            "pd.to_datetime(df['date'], format='%m/%d/%Y %H:%M:%S.%f')", 
            "pd.to_datetime(df['date'], infer_datetime_format=True)", 
            "df['date'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))"],
    n_range=[2**k for k in range(20)],
    setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}:{S}.{f}" 
                               for m,d,Y,H,M,S,f in zip(*[choices(e, k=n) for e in mdYHMSf])]),
    equality_check=pd.Series.equals,
    xlabel='len(df)'
)

If the column contains multiple formats, see Convert a column of mixed format strings to a datetime Dtype.

回复收藏 0 原文

执着的年纪 2025-01-20 20:10:33

就像我们将对象数据类型转换为 float 或 int 一样，使用 astype ()。

raw_data['Mycol'] = raw_data['Mycol'].astype('datetime64[ns]')

Just like we convert object data type to float or int, use astype().

raw_data['Mycol'] = raw_data['Mycol'].astype('datetime64[ns]')

回复收藏 0 原文

~没有更多了~

关于作者

暖心男生

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

将 Pandas 列转换为日期时间

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

设置正确的 `format=` 比让 pandas 找出来要快得多¹

`errors='coerce'` 很有用

静音 `SettingWithCopyWarning`

Setting the correct `format=` is much faster than letting pandas find out¹

`errors='coerce'` is useful

To silence `SettingWithCopyWarning`

关于作者

相关话题

热门标签

推荐作者

tomoekana

无边思念无边月

眼角的笑意。

在风中等你

是你

syong71

友情链接

将 Pandas 列转换为日期时间

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

设置正确的 format= 比让 pandas 找出来要快得多1

errors='coerce' 很有用

静音 SettingWithCopyWarning

Setting the correct format= is much faster than letting pandas find out1

errors='coerce' is useful

To silence SettingWithCopyWarning

关于作者

相关话题

热门标签

推荐作者

tomoekana

无边思念无边月

眼角的笑意。

在风中等你

是你

syong71

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

设置正确的 `format=` 比让 pandas 找出来要快得多¹

`errors='coerce'` 很有用

静音 `SettingWithCopyWarning`

Setting the correct `format=` is much faster than letting pandas find out¹

`errors='coerce'` is useful

To silence `SettingWithCopyWarning`