Python pandas-高峰时段和非高峰时段分析的日期时间
所以我有一个这样的 df:
import pandas as pd
import numpy as np
datatime = [('2019-09-15 00:15:00.000000000'),
('2019-09-15 00:30:00.000000000'),
('2019-09-15 00:45:00.000000000'),
('2019-09-15 01:00:00.000000000'),
('2019-09-15 01:15:00.000000000'),
('2019-09-15 01:30:00.000000000'),
('2019-09-15 01:45:00.000000000'),
('2019-09-15 02:00:00.000000000'),
('2019-09-15 02:15:00.000000000')]
p =[494.76,486.36,484.68,500.64,482.16,483.84,483.0,478.8,493.08,474.6]
q = [47.88,33.6,41.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
df
我正在通过将数据分组为高峰和非高峰时段来进行 30 天的分析。为此,我还需要确定一周中的哪几天。我尝试使用 pandas
函数:
df.dt.day_name()
但在这种特殊情况下,这是不可行的,因为对于此函数,一天从 00:00:00
开始,并且在我的程序中我需要它从 00:15:00
开始。 由于我每天都有 96 分,所以我考虑使用字典:
days_of_the_week = {'Sunday': 1,'Monday': 2,'Tuesday': 3, 'Wednesday': 4, 'Thursday':5, 'Friday':6 , 'Saturday':7}
如何将它应用到我的 df 中,以便每 96 分就识别出新的一天?
So I have a df like this:
import pandas as pd
import numpy as np
datatime = [('2019-09-15 00:15:00.000000000'),
('2019-09-15 00:30:00.000000000'),
('2019-09-15 00:45:00.000000000'),
('2019-09-15 01:00:00.000000000'),
('2019-09-15 01:15:00.000000000'),
('2019-09-15 01:30:00.000000000'),
('2019-09-15 01:45:00.000000000'),
('2019-09-15 02:00:00.000000000'),
('2019-09-15 02:15:00.000000000')]
p =[494.76,486.36,484.68,500.64,482.16,483.84,483.0,478.8,493.08,474.6]
q = [47.88,33.6,41.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
df
I am doing an analysis for 30 days by grouping my data into on-peak and off-peak hours. For this I also need to identify the days of the week. I tried use the pandas
function:
df.dt.day_name()
But in this particular case it is not feasible since for this function the day start at 00:00:00
and at my program I need it to start at 00:15:00
.
Since I have 96 points for each day, I thought about using a dictionary:
days_of_the_week = {'Sunday': 1,'Monday': 2,'Tuesday': 3, 'Wednesday': 4, 'Thursday':5, 'Friday':6 , 'Saturday':7}
How can I apply it to my df so that every 96 points a new day is identified?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用 添加偏移量计算工作日时的
Timedelta
对象。这不会影响datetime
列的值。只是关于您构建
DataFrame
的方式的注释。df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
list 的使用
是不必要的,并且可能会影响较大数据集的性能。此外,您不应使用columns
参数的嵌套列表,因为它会产生意想不到的效果。You can just add an offset using a
Timedelta
object when calculating the weekday. This won't affect the values of thedatetime
column.Just a note on the way you constructed you
DataFrame
.df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
The use of
list
is unnecessary and could impede performance for larger data sets. Additionally, you shouldn't use the the nested list for thecolumns
argument as it has unintended effects.