Python pandas-高峰时段和非高峰时段分析的日期时间

发布于 2025-01-11 05:41:49 字数 1144 浏览 1 评论 0原文

所以我有一个这样的 df：

import pandas as pd
import numpy as np

datatime = [('2019-09-15 00:15:00.000000000'),
            ('2019-09-15 00:30:00.000000000'),
            ('2019-09-15 00:45:00.000000000'),
            ('2019-09-15 01:00:00.000000000'),
            ('2019-09-15 01:15:00.000000000'),
            ('2019-09-15 01:30:00.000000000'),
            ('2019-09-15 01:45:00.000000000'),
            ('2019-09-15 02:00:00.000000000'),
            ('2019-09-15 02:15:00.000000000')]
p =[494.76,486.36,484.68,500.64,482.16,483.84,483.0,478.8,493.08,474.6]
q = [47.88,33.6,41.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0]

df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
df

我正在通过将数据分组为高峰和非高峰时段来进行 30 天的分析。为此，我还需要确定一周中的哪几天。我尝试使用 pandas 函数：

df.dt.day_name()

但在这种特殊情况下，这是不可行的，因为对于此函数，一天从 00:00:00 开始，并且在我的程序中我需要它从 00:15:00 开始。由于我每天都有 96 分，所以我考虑使用字典：

days_of_the_week = {'Sunday': 1,'Monday': 2,'Tuesday': 3, 'Wednesday': 4, 'Thursday':5, 'Friday':6 , 'Saturday':7}

如何将它应用到我的 df 中，以便每 96 分就识别出新的一天？

原文

So I have a df like this:

import pandas as pd
import numpy as np

datatime = [('2019-09-15 00:15:00.000000000'),
            ('2019-09-15 00:30:00.000000000'),
            ('2019-09-15 00:45:00.000000000'),
            ('2019-09-15 01:00:00.000000000'),
            ('2019-09-15 01:15:00.000000000'),
            ('2019-09-15 01:30:00.000000000'),
            ('2019-09-15 01:45:00.000000000'),
            ('2019-09-15 02:00:00.000000000'),
            ('2019-09-15 02:15:00.000000000')]
p =[494.76,486.36,484.68,500.64,482.16,483.84,483.0,478.8,493.08,474.6]
q = [47.88,33.6,41.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0]

df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
df

I am doing an analysis for 30 days by grouping my data into on-peak and off-peak hours. For this I also need to identify the days of the week. I tried use the pandas function:

df.dt.day_name()

But in this particular case it is not feasible since for this function the day start at 00:00:00 and at my program I need it to start at 00:15:00.
Since I have 96 points for each day, I thought about using a dictionary:

days_of_the_week = {'Sunday': 1,'Monday': 2,'Tuesday': 3, 'Wednesday': 4, 'Thursday':5, 'Friday':6 , 'Saturday':7}

How can I apply it to my df so that every 96 points a new day is identified?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

顾冷 2025-01-18 05:41:49

您可以使用添加偏移量计算工作日时的 Timedelta 对象。这不会影响 datetime 列的值。

In [21]: dt_index = pd.date_range(start='2022-01-01', end='2022-01-01 23:45:00', periods=96)

In [23]: df = pd.DataFrame(zip(dt_index, np.random.rand(len(dt_index))), columns=['datetime', 'whatever'])

In [24]: df.tail()
Out[24]:
              datetime  whatever
91 2022-01-01 22:45:00  0.910446
92 2022-01-01 23:00:00  0.199106
93 2022-01-01 23:15:00  0.051808
94 2022-01-01 23:30:00  0.799284
95 2022-01-01 23:45:00  0.584663

In [25]: df['weekday'] = (df.datetime.astype('datetime64[ns]') + pd.Timedelta(seconds=15*60)).dt.day_name()

In [26]: df.tail()
Out[26]:
              datetime  whatever   weekday
91 2022-01-01 22:45:00  0.910446  Saturday
92 2022-01-01 23:00:00  0.199106  Saturday
93 2022-01-01 23:15:00  0.051808  Saturday
94 2022-01-01 23:30:00  0.799284  Saturday
95 2022-01-01 23:45:00  0.584663    Sunday

只是关于您构建 DataFrame 的方式的注释。

df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])

list 的使用 是不必要的，并且可能会影响较大数据集的性能。此外，您不应使用 columns 参数的嵌套列表，因为它会产生意想不到的效果。

In [27]: df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])

In [28]: type(df.datetime)
Out[28]: pandas.core.frame.DataFrame

In [29]: df = pd.DataFrame(zip(datatime, p, q), columns=['datetime','p','q'])

In [30]: type(df.datetime)
Out[30]: pandas.core.series.Series

You can just add an offset using a Timedelta object when calculating the weekday. This won't affect the values of the datetime column.

In [21]: dt_index = pd.date_range(start='2022-01-01', end='2022-01-01 23:45:00', periods=96)

In [23]: df = pd.DataFrame(zip(dt_index, np.random.rand(len(dt_index))), columns=['datetime', 'whatever'])

In [24]: df.tail()
Out[24]:
              datetime  whatever
91 2022-01-01 22:45:00  0.910446
92 2022-01-01 23:00:00  0.199106
93 2022-01-01 23:15:00  0.051808
94 2022-01-01 23:30:00  0.799284
95 2022-01-01 23:45:00  0.584663

In [25]: df['weekday'] = (df.datetime.astype('datetime64[ns]') + pd.Timedelta(seconds=15*60)).dt.day_name()

In [26]: df.tail()
Out[26]:
              datetime  whatever   weekday
91 2022-01-01 22:45:00  0.910446  Saturday
92 2022-01-01 23:00:00  0.199106  Saturday
93 2022-01-01 23:15:00  0.051808  Saturday
94 2022-01-01 23:30:00  0.799284  Saturday
95 2022-01-01 23:45:00  0.584663    Sunday

Just a note on the way you constructed you DataFrame.

df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])

The use of list is unnecessary and could impede performance for larger data sets. Additionally, you shouldn't use the the nested list for the columns argument as it has unintended effects.

In [27]: df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])

In [28]: type(df.datetime)
Out[28]: pandas.core.frame.DataFrame

In [29]: df = pd.DataFrame(zip(datatime, p, q), columns=['datetime','p','q'])

In [30]: type(df.datetime)
Out[30]: pandas.core.series.Series

回复收藏 0 原文

~没有更多了~