我想将时间序列(滞后或平移)(天、周、月或年)而不循环
我基本上是在尝试制作我的《时代系列》的新专栏,并且我希望根据需要延迟几天、几周、几个月或几年。 我做了一个函数来解决这个问题,但效率很低。
def lag_N_period ( df, y , days_ago=0 , weeks_ago=0 , months_ago=0 , years_ago=0 ):
skip = days_ago + weeks_ago*7 + months_ago*31 + years_ago*366
## FEATURE NAME ##
feature_name = ''
if days_ago > 0 :
feature_name = feature_name + str(days_ago) + 'days_'
if weeks_ago > 0 :
feature_name = feature_name + str(weeks_ago) + 'weeks_'
if months_ago > 0 :
feature_name = feature_name + str(months_ago) + 'months_'
if years_ago > 0 :
feature_name = feature_name + str(years_ago) + 'years_'
feature_name = feature_name + 'ago'
df[feature_name] = [np.nan for i in range(len(df[objetivo])) ] #Creates NaN column named 'feature_name'
for i in df.index[skip:]:
j = i - dateutil.relativedelta.relativedelta(days=days_ago , weeks=weeks_ago , months=months_ago , years=years_ago)
df[feature_name][i] = df[y][j]
return df
跳过只是一个 int ,因为如果在循环中您调用数据帧中的索引并且它不存在,您会收到错误,但会收到其他任何信息。
df 是我的数据框,以日期为索引,“y”为目标变量,
objective
date
2018-01-01 3420
2018-01-02 100580
2018-01-03 78500
2018-01-04 72640
2018-01-05 64980
... ...
2021-01-27 76820
2021-01-28 90520
2021-01-29 81920
2021-01-30 20080
2021-01-31 0
我尝试将 .shift() 函数作为 .shift(1, period='M') 但它不是 y 想要的输出。 它起作用的唯一情况是当我只想要 5 或几天前的滞后时,例如 .shift(5)
I am basically trying to make new columns of my Time Serie and I want te lag of some days, weeks, months or years as wanted.
I have made a function that solves this problem but is highly ineficien.
def lag_N_period ( df, y , days_ago=0 , weeks_ago=0 , months_ago=0 , years_ago=0 ):
skip = days_ago + weeks_ago*7 + months_ago*31 + years_ago*366
## FEATURE NAME ##
feature_name = ''
if days_ago > 0 :
feature_name = feature_name + str(days_ago) + 'days_'
if weeks_ago > 0 :
feature_name = feature_name + str(weeks_ago) + 'weeks_'
if months_ago > 0 :
feature_name = feature_name + str(months_ago) + 'months_'
if years_ago > 0 :
feature_name = feature_name + str(years_ago) + 'years_'
feature_name = feature_name + 'ago'
df[feature_name] = [np.nan for i in range(len(df[objetivo])) ] #Creates NaN column named 'feature_name'
for i in df.index[skip:]:
j = i - dateutil.relativedelta.relativedelta(days=days_ago , weeks=weeks_ago , months=months_ago , years=years_ago)
df[feature_name][i] = df[y][j]
return df
The skip is just a int because if in the loop you call for a index in the dataframe and it doesn´t exist, you get an error, but anything else.
df is my dataframe with dates as index and 'y', the objective variable
objective
date
2018-01-01 3420
2018-01-02 100580
2018-01-03 78500
2018-01-04 72640
2018-01-05 64980
... ...
2021-01-27 76820
2021-01-28 90520
2021-01-29 81920
2021-01-30 20080
2021-01-31 0
I have try the .shift() function as .shift(1, period='M') but it's not the output y want.
The only case it works is when i just want the lag of 5 or some days ago llike, .shift(5)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
给定一个带有
DatetimeIndex
的数据帧,它没有任何像这样丢失的日子,您可以执行以下操作
并获取
idx
是为了确保实际的移位不会发生不会失败。这是您尝试通过skip
解决的部分。 (您的skip
实际上有点不精确,因为您普遍使用 31/366 天来表示月/年长度。)但是,当您使用月和/或年时,请准备好遇到奇怪的现象。例如
True
。Given a dataframe with a
DatetimeIndex
which doesn't have any missing days like thisyou could do the following
and get
The
idx
is there to make sure that the actual shifting doesn't fail. It's the part you're trying to address byskip
. (Yourskip
is actually a bit imprecise because you're using 31/366 days for month/year lengths universally.)But be prepared to run into strange phenomena when you're using months and/or years. For example
is
True
.我们可以使用
relativedelta
,pandas.to_datetime
和pandas.DataFrame.apply
。We can use
relativedelta
,pandas.to_datetime
andpandas.DataFrame.apply
.