通过python中的一系列日期迭代，日期缺失

发布于 2025-01-22 02:35:32 字数 324 浏览 3 评论 0原文

在这里，我得到了熊猫数据框架，每天的股票回报是日期和回报率。但是，如果我只想保留每周的最后一天，并且数据有一些丢失的日子，我该怎么办？

import pandas as pd

df = pd.read_csv('Daily_return.csv')
df.Date = pd.to_datetime(db.Date)
count = 300
for last_day in ('2017-01-01' + 7n for n in range(count)):

实际上，我的大脑在这一点上停止工作，因为我的想象力有限……也许最大的一点是“+7n”的东西毫无意义。

原文

Here I got a pandas data frame with daily return of stocks and columns are date and return rate.
But if I only want to keep the last day of each week, and the data has some missing days, what can I do?

import pandas as pd

df = pd.read_csv('Daily_return.csv')
df.Date = pd.to_datetime(db.Date)
count = 300
for last_day in ('2017-01-01' + 7n for n in range(count)):

Actually my brain stop working at this point with my limited imagination......Maybe one of the biggest point is "+7n" kind of stuff is meaningless with some missing dates.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鲜肉鲜肉永远不皱 2025-01-29 02:35:32

我将创建一个带有40个日期和40个样本返回的示例数据集，然后随机对90％进行采样以模拟丢失的日期。

此处的关键是，如果尚未，则需要将date列转换为DateTime，并确保您的DF按日期进行排序。

然后，您可以年/每周分组并占据最后一个价值。如果您反复运行此操作，您会发现所选日期可能会更改，如果降低的值是一周的最后一天。

基于该

import pandas as pd
import numpy as np

df = pd.DataFrame()
df['date'] = pd.date_range(start='04-18-2022',periods=40, freq='D')
df['return'] = np.random.uniform(size=40)

# Keep 90 percent of the records so we can see what happens when some days are missing
df = df.sample(frac=.9)

# In case your dates are actually strings
df['date'] = pd.to_datetime(df['date'])

# Make sure they are sorted from oldest to newest
df = df.sort_values(by='date')

df = df.groupby([df['date'].dt.isocalendar().year,
                 df['date'].dt.isocalendar().week], as_index=False).last()

print(df)

输出

       date    return
0 2022-04-24  0.299958
1 2022-05-01  0.248471
2 2022-05-08  0.506919
3 2022-05-15  0.541929
4 2022-05-22  0.588768
5 2022-05-27  0.504419

I'll create a sample dataset with 40 dates and 40 sample returns, then sample 90 percent of that randomly to simulate the missing dates.

The key here is that you need to convert your date column into datetime if it isn't already, and make sure your df is sorted by the date.

Then you can groupby year/week and take the last value. If you run this repeatedly you'll see that the selected dates can change if the value dropped was the last day of the week.

Based on that

import pandas as pd
import numpy as np

df = pd.DataFrame()
df['date'] = pd.date_range(start='04-18-2022',periods=40, freq='D')
df['return'] = np.random.uniform(size=40)

# Keep 90 percent of the records so we can see what happens when some days are missing
df = df.sample(frac=.9)

# In case your dates are actually strings
df['date'] = pd.to_datetime(df['date'])

# Make sure they are sorted from oldest to newest
df = df.sort_values(by='date')

df = df.groupby([df['date'].dt.isocalendar().year,
                 df['date'].dt.isocalendar().week], as_index=False).last()

print(df)

Output

       date    return
0 2022-04-24  0.299958
1 2022-05-01  0.248471
2 2022-05-08  0.506919
3 2022-05-15  0.541929
4 2022-05-22  0.588768
5 2022-05-27  0.504419

回复收藏 0 原文

~没有更多了~