日期列中特定日期的值
在问题开始时,您只有 2 列:日期和值。
从这里开始,我们的想法是获取过去一个月和过去一年的价值。最终输出如下:
日期 | 值 | m1 | m12 | m1_val | m12_val |
---|---|---|---|---|---|
2022-02-27 | 100 | 2022-01-27 | 2021-02-27 | nan | nan |
2022-03-27 | 300 | 2022-02-27 | 2021-03-27 | 100 | 南 |
2022-03-30 | 500 | 2022-02-30 | 2022-03-30 | nan | nan |
2023-02-27 | 800 | 2023-01-27 | 2022-02-27 | nan | 100 |
我已经完成了,但没有矢量化,我想改变向量化的最终应用函数,不需要逐行进行。
例如,要创建列 m1 和 m12,您可以使用
d['year'] = d['date'].dt.year
d['month'] = d['date'].dt.month
d['day'] = d['date'].dt.day
d['month-1'] = (d['month'] - 1)
d['year-1'] = d['date'].dt.year
d['year-12'] = d['year'] - 1
d.loc[d['month-1'] == 0, 'year-1'] = d.loc[d['month-1'] == 0, 'year-1'] - 1
d.loc[d['month-1'] == 0, 'month-1'] = 12
d['m1'] = pd.to_datetime(d[['year-1', 'month-1', 'day']].rename({'year-1':'year', 'month-1':'month'}, axis=1), errors='coerce')
d['m12'] = pd.to_datetime(d[['year-12', 'month', 'day']].rename({'year-12':'year'}, axis=1), errors='coerce')
d = d.drop(['year', 'month', 'day', 'month-1', 'year-1', 'year-12'], axis=1)
这样,我使用下一个 apply 函数来填充列 m1_val 和 m12_val,它基本上搜索日期列中的每个所需值并返回它。
def test(x, col):
value = d.loc[d['date'] == x[col]]['value']
if len(value) == 0:
return np.nan
else:
return value.iloc[0]
d['m1_val'] = d.apply(lambda x: test(x, 'm1'), axis=1)
d['m12_val'] = d.apply(lambda x: test(x, 'm12'), axis=1)
但是,是否有更好的方法从日期列获取 m1 的值而不使用 for 循环?我在想也许我可以使用 np.where o d.loc 的东西...但我不知道如何使用 d.loc[d['date'].isin(d['m1'] )]
然后是 groupby()?但就性能而言,它看起来与使用 apply() 类似
At the start of the problem, you only have 2 columns, date and value.
From here the idea is getting the value from past month and past year. The final output would be something like this:
date | value | m1 | m12 | m1_val | m12_val |
---|---|---|---|---|---|
2022-02-27 | 100 | 2022-01-27 | 2021-02-27 | nan | nan |
2022-03-27 | 300 | 2022-02-27 | 2021-03-27 | 100 | nan |
2022-03-30 | 500 | 2022-02-30 | 2022-03-30 | nan | nan |
2023-02-27 | 800 | 2023-01-27 | 2022-02-27 | nan | 100 |
I have already done it but without vectorization, and I wanted to change the final apply function for something vectorize, to not need to go row by row.
For example, to create the columns m1 and m12 you could use
d['year'] = d['date'].dt.year
d['month'] = d['date'].dt.month
d['day'] = d['date'].dt.day
d['month-1'] = (d['month'] - 1)
d['year-1'] = d['date'].dt.year
d['year-12'] = d['year'] - 1
d.loc[d['month-1'] == 0, 'year-1'] = d.loc[d['month-1'] == 0, 'year-1'] - 1
d.loc[d['month-1'] == 0, 'month-1'] = 12
d['m1'] = pd.to_datetime(d[['year-1', 'month-1', 'day']].rename({'year-1':'year', 'month-1':'month'}, axis=1), errors='coerce')
d['m12'] = pd.to_datetime(d[['year-12', 'month', 'day']].rename({'year-12':'year'}, axis=1), errors='coerce')
d = d.drop(['year', 'month', 'day', 'month-1', 'year-1', 'year-12'], axis=1)
And with this, I was using the next apply function to fill the cols m1_val and m12_val, which basically search each needed value in the date column and return it.
def test(x, col):
value = d.loc[d['date'] == x[col]]['value']
if len(value) == 0:
return np.nan
else:
return value.iloc[0]
d['m1_val'] = d.apply(lambda x: test(x, 'm1'), axis=1)
d['m12_val'] = d.apply(lambda x: test(x, 'm12'), axis=1)
But, is there a better way to get the value of m1 from date column without using for loops? I was thinking maybe I could use something with np.where o d.loc... but I didn't know how, with d.loc[d['date'].isin(d['m1'])]
and then a groupby()? But in terms of performance it looks similar as using apply()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论