我想将时间序列(滞后或平移)(天、周、月或年)而不循环

发布于 2025-01-11 00:44:58 字数 1519 浏览 1 评论 0原文

我基本上是在尝试制作我的《时代系列》的新专栏,并且我希望根据需要延迟几天、几周、几个月或几年。 我做了一个函数来解决这个问题,但效率很低。

def lag_N_period ( df, y , days_ago=0 , weeks_ago=0 , months_ago=0 , years_ago=0 ):
   


    skip = days_ago + weeks_ago*7 + months_ago*31 + years_ago*366 


    
    ## FEATURE NAME ## 
    feature_name = '' 
    
    if days_ago > 0  :
        feature_name = feature_name + str(days_ago) + 'days_' 
    if weeks_ago > 0  :
        feature_name = feature_name + str(weeks_ago) + 'weeks_'
    if months_ago > 0  :
        feature_name = feature_name + str(months_ago) + 'months_'        
    if years_ago > 0  :
        feature_name = feature_name + str(years_ago) + 'years_'        
        
    feature_name = feature_name + 'ago'



    
    df[feature_name] = [np.nan for i in range(len(df[objetivo])) ] #Creates NaN column named 'feature_name'

    
    for i in df.index[skip:]:

        j = i - dateutil.relativedelta.relativedelta(days=days_ago , weeks=weeks_ago , months=months_ago , years=years_ago) 
        df[feature_name][i] = df[y][j]

    return df

跳过只是一个 int ,因为如果在循环中您调用数据帧中的索引并且它不存在,您会收到错误,但会收到其他任何信息。

df 是我的数据框,以日期为索引,“y”为目标变量,

            objective
date    
2018-01-01  3420
2018-01-02  100580
2018-01-03  78500
2018-01-04  72640
2018-01-05  64980
... ...
2021-01-27  76820
2021-01-28  90520
2021-01-29  81920
2021-01-30  20080
2021-01-31  0

我尝试将 .shift() 函数作为 .shift(1, period='M') 但它不是 y 想要的输出。 它起作用的唯一情况是当我只想要 5 或几天前的滞后时,例如 .shift(5)

I am basically trying to make new columns of my Time Serie and I want te lag of some days, weeks, months or years as wanted.
I have made a function that solves this problem but is highly ineficien.

def lag_N_period ( df, y , days_ago=0 , weeks_ago=0 , months_ago=0 , years_ago=0 ):
   


    skip = days_ago + weeks_ago*7 + months_ago*31 + years_ago*366 


    
    ## FEATURE NAME ## 
    feature_name = '' 
    
    if days_ago > 0  :
        feature_name = feature_name + str(days_ago) + 'days_' 
    if weeks_ago > 0  :
        feature_name = feature_name + str(weeks_ago) + 'weeks_'
    if months_ago > 0  :
        feature_name = feature_name + str(months_ago) + 'months_'        
    if years_ago > 0  :
        feature_name = feature_name + str(years_ago) + 'years_'        
        
    feature_name = feature_name + 'ago'



    
    df[feature_name] = [np.nan for i in range(len(df[objetivo])) ] #Creates NaN column named 'feature_name'

    
    for i in df.index[skip:]:

        j = i - dateutil.relativedelta.relativedelta(days=days_ago , weeks=weeks_ago , months=months_ago , years=years_ago) 
        df[feature_name][i] = df[y][j]

    return df

The skip is just a int because if in the loop you call for a index in the dataframe and it doesn´t exist, you get an error, but anything else.

df is my dataframe with dates as index and 'y', the objective variable

            objective
date    
2018-01-01  3420
2018-01-02  100580
2018-01-03  78500
2018-01-04  72640
2018-01-05  64980
... ...
2021-01-27  76820
2021-01-28  90520
2021-01-29  81920
2021-01-30  20080
2021-01-31  0

I have try the .shift() function as .shift(1, period='M') but it's not the output y want.
The only case it works is when i just want the lag of 5 or some days ago llike, .shift(5)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

雪落纷纷 2025-01-18 00:44:58

给定一个带有 DatetimeIndex 的数据帧,它没有任何像这样丢失的日子,

df = pd.DataFrame(
    {"A": range(500)}, index=pd.date_range("2022-03-01", periods=500, freq="1D")
)

              A
2022-03-01    0
2022-03-02    1
...         ...
2023-07-12  498
2023-07-13  499

您可以执行以下操作

from dateutil.relativedelta import relativedelta

delta = relativedelta(months=1)
df["B"] = None  # None instead of other NaNs - can be changed
idx = df.loc[df.index[0] + delta:].index
df.loc[idx, "B"] = df.loc[[day - delta for day in idx], "A"].values

并获取

              A     B
2022-03-01    0  None
2022-03-02    1  None
...         ...   ...
2023-07-12  498   468
2023-07-13  499   469

idx 是为了确保实际的移位不会发生不会失败。这是您尝试通过skip 解决的部分。 (您的 skip 实际上有点不精确,因为您普遍使用 31/366 天来表示月/年长度。)

但是,当您使用月和/或年时,请准备好遇到奇怪的现象。例如

from datetime import date

delta = relativedelta(months=1)
date(2022, 3, 30) + delta == date(2022, 3, 31) + delta

True

Given a dataframe with a DatetimeIndex which doesn't have any missing days like this

df = pd.DataFrame(
    {"A": range(500)}, index=pd.date_range("2022-03-01", periods=500, freq="1D")
)

              A
2022-03-01    0
2022-03-02    1
...         ...
2023-07-12  498
2023-07-13  499

you could do the following

from dateutil.relativedelta import relativedelta

delta = relativedelta(months=1)
df["B"] = None  # None instead of other NaNs - can be changed
idx = df.loc[df.index[0] + delta:].index
df.loc[idx, "B"] = df.loc[[day - delta for day in idx], "A"].values

and get

              A     B
2022-03-01    0  None
2022-03-02    1  None
...         ...   ...
2023-07-12  498   468
2023-07-13  499   469

The idx is there to make sure that the actual shifting doesn't fail. It's the part you're trying to address by skip. (Your skip is actually a bit imprecise because you're using 31/366 days for month/year lengths universally.)

But be prepared to run into strange phenomena when you're using months and/or years. For example

from datetime import date

delta = relativedelta(months=1)
date(2022, 3, 30) + delta == date(2022, 3, 31) + delta

is True.

氛圍 2025-01-18 00:44:58

我们可以使用 relativedelta, pandas.to_datetimepandas.DataFrame.apply

from dateutil.relativedelta import relativedelta
import pandas as pd

# Sample dataframe
>>> a = pd.DataFrame([('2021-01-01'), ('2021-01-02'), ('2022-01-01')], columns=['Date'])

# Contents of a
>>> a
         Date
0  2021-01-01
1  2021-01-02
2  2022-01-01

# Ensuring Date is a datetime column
>>> a['Date'] = pd.to_datetime(a['Date'])

# Adding a month to all of the dates
>>> a.Date.apply(lambda x: x + relativedelta(months=1))
0   2021-02-01
1   2021-02-02
2   2022-02-01
Name: Date, dtype: datetime64[ns]

We can use relativedelta, pandas.to_datetime and pandas.DataFrame.apply.

from dateutil.relativedelta import relativedelta
import pandas as pd

# Sample dataframe
>>> a = pd.DataFrame([('2021-01-01'), ('2021-01-02'), ('2022-01-01')], columns=['Date'])

# Contents of a
>>> a
         Date
0  2021-01-01
1  2021-01-02
2  2022-01-01

# Ensuring Date is a datetime column
>>> a['Date'] = pd.to_datetime(a['Date'])

# Adding a month to all of the dates
>>> a.Date.apply(lambda x: x + relativedelta(months=1))
0   2021-02-01
1   2021-02-02
2   2022-02-01
Name: Date, dtype: datetime64[ns]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文