日期列中特定日期的值

发布于 2025-01-19 12:12:32 字数 1975 浏览 0 评论 0原文

在问题开始时，您只有 2 列：日期和值。

从这里开始，我们的想法是获取过去一个月和过去一年的价值。最终输出如下：

日期	值	m1	m12	m1_val	m12_val
2022-02-27	100	2022-01-27	2021-02-27	nan	nan
2022-03-27	300	2022-02-27	2021-03-27	100	南
2022-03-30	500	2022-02-30	2022-03-30	nan	nan
2023-02-27	800	2023-01-27	2022-02-27	nan	100

我已经完成了，但没有矢量化，我想改变向量化的最终应用函数，不需要逐行进行。

例如，要创建列 m1 和 m12，您可以使用

d['year'] = d['date'].dt.year
d['month'] = d['date'].dt.month
d['day'] = d['date'].dt.day
d['month-1'] = (d['month'] - 1)
d['year-1'] = d['date'].dt.year
d['year-12'] = d['year'] - 1
d.loc[d['month-1'] == 0, 'year-1'] = d.loc[d['month-1'] == 0, 'year-1'] - 1
d.loc[d['month-1'] == 0, 'month-1'] = 12
d['m1'] = pd.to_datetime(d[['year-1', 'month-1', 'day']].rename({'year-1':'year', 'month-1':'month'}, axis=1), errors='coerce')
d['m12'] = pd.to_datetime(d[['year-12', 'month', 'day']].rename({'year-12':'year'}, axis=1), errors='coerce')
d = d.drop(['year', 'month', 'day', 'month-1', 'year-1', 'year-12'], axis=1)

这样，我使用下一个 apply 函数来填充列 m1_val 和 m12_val，它基本上搜索日期列中的每个所需值并返回它。

def test(x, col):
    value = d.loc[d['date'] == x[col]]['value']
    if len(value) == 0:
        return np.nan
    else:
        return value.iloc[0]

d['m1_val'] = d.apply(lambda x: test(x, 'm1'), axis=1)
d['m12_val'] = d.apply(lambda x: test(x, 'm12'), axis=1)

但是，是否有更好的方法从日期列获取 m1 的值而不使用 for 循环？我在想也许我可以使用 np.where o d.loc 的东西...但我不知道如何使用 d.loc[d['date'].isin(d['m1'] )] 然后是 groupby()？但就性能而言，它看起来与使用 apply() 类似

原文

At the start of the problem, you only have 2 columns, date and value.

From here the idea is getting the value from past month and past year. The final output would be something like this:

date	value	m1	m12	m1_val	m12_val
2022-02-27	100	2022-01-27	2021-02-27	nan	nan
2022-03-27	300	2022-02-27	2021-03-27	100	nan
2022-03-30	500	2022-02-30	2022-03-30	nan	nan
2023-02-27	800	2023-01-27	2022-02-27	nan	100

I have already done it but without vectorization, and I wanted to change the final apply function for something vectorize, to not need to go row by row.

For example, to create the columns m1 and m12 you could use

d['year'] = d['date'].dt.year
d['month'] = d['date'].dt.month
d['day'] = d['date'].dt.day
d['month-1'] = (d['month'] - 1)
d['year-1'] = d['date'].dt.year
d['year-12'] = d['year'] - 1
d.loc[d['month-1'] == 0, 'year-1'] = d.loc[d['month-1'] == 0, 'year-1'] - 1
d.loc[d['month-1'] == 0, 'month-1'] = 12
d['m1'] = pd.to_datetime(d[['year-1', 'month-1', 'day']].rename({'year-1':'year', 'month-1':'month'}, axis=1), errors='coerce')
d['m12'] = pd.to_datetime(d[['year-12', 'month', 'day']].rename({'year-12':'year'}, axis=1), errors='coerce')
d = d.drop(['year', 'month', 'day', 'month-1', 'year-1', 'year-12'], axis=1)

And with this, I was using the next apply function to fill the cols m1_val and m12_val, which basically search each needed value in the date column and return it.

def test(x, col):
    value = d.loc[d['date'] == x[col]]['value']
    if len(value) == 0:
        return np.nan
    else:
        return value.iloc[0]

d['m1_val'] = d.apply(lambda x: test(x, 'm1'), axis=1)
d['m12_val'] = d.apply(lambda x: test(x, 'm12'), axis=1)

But, is there a better way to get the value of m1 from date column without using for loops? I was thinking maybe I could use something with np.where o d.loc... but I didn't know how, with d.loc[d['date'].isin(d['m1'])] and then a groupby()? But in terms of performance it looks similar as using apply()

分享到QQ

分享到微博