可以显示每月大熊猫的最小日期和最大日期的循环

发布于 2025-01-22 06:39:53 字数 1166 浏览 1 评论 0原文

datetimedata的映像样本...它落后于4/1/2022

import pandas as pd    
import calendar
from datetime import date

sd = '2021-12-01'
ed = '2022-03-31'
 
date_range = pd.date_range(sd,ed,
              freq='MS').map(lambda x: (x.year, x.month)).tolist()

def get_dates(year, month):
  return (date(year, month, 1).strftime("%d-%b-%Y"),
          date(year,
                month,
                calendar.monthrange(year, month)[1]
                ).strftime("%d-%b-%Y"))

[get_dates(year, month)
for year, month in date_range]
 
df2 = pd.read_csv('datetimedata.csv')
df2.loc[:,'timestamp'] = pd.to_datetime(df2.loc[:,'timestamp'])
df2 = df2.loc[(df2['timestamp'] >= sd)
                       & (df2['timestamp'] <= ed)]

要在我的控制台中获取最小值和最大值,它在下面给出了此答案:

df2.timestamp.min()
Out[102]: Timestamp('2021-12-01 00:00:00')
 
df2.timestamp.max()
Out[103]: Timestamp('2022-03-31 00:00:00')

但是我想要它,以便我要在第一次运行后停止循环(2021-12-31 00:00:00)。然后,当我将循环通过3月,然后运行代码,最大值应为2022-03-31,最小值应为2022-03-01。我只想希望它连续每个月循环,显示每个月的最小日期和最大日期。

Image sample of the datetimedata... It goes down to 4/1/2022

import pandas as pd    
import calendar
from datetime import date

sd = '2021-12-01'
ed = '2022-03-31'
 
date_range = pd.date_range(sd,ed,
              freq='MS').map(lambda x: (x.year, x.month)).tolist()

def get_dates(year, month):
  return (date(year, month, 1).strftime("%d-%b-%Y"),
          date(year,
                month,
                calendar.monthrange(year, month)[1]
                ).strftime("%d-%b-%Y"))

[get_dates(year, month)
for year, month in date_range]
 
df2 = pd.read_csv('datetimedata.csv')
df2.loc[:,'timestamp'] = pd.to_datetime(df2.loc[:,'timestamp'])
df2 = df2.loc[(df2['timestamp'] >= sd)
                       & (df2['timestamp'] <= ed)]

When I write to get the min and max in my console it gives this answer below:

df2.timestamp.min()
Out[102]: Timestamp('2021-12-01 00:00:00')
 
df2.timestamp.max()
Out[103]: Timestamp('2022-03-31 00:00:00')

However I wanted it so that if I were to stop the loop after the first run the max would be (2021-12-31 00:00:00). Then when I have the loop going through March then the codes finishes running, max should be 2022-03-31 and min should be 2022-03-01. I just want it to loop through each month showing the min and max date for each consecutive month.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

羅雙樹 2025-01-29 06:39:53

我不确定您为什么完全需要date_range。为什么不简单地按月按df2 ['TIMESTAMP']进行分组?

# example setup for df2: random timestamps (not even at any particular frequency)

n = 100
s, e = '2021', '2022-04-01'
np.random.seed(0)
df2 = pd.DataFrame({
    'timestamp': np.random.uniform(pd.Timestamp(s).value, pd.Timestamp(e).value, n)
})
df2['timestamp'] = df2['timestamp'].apply(pd.Timestamp)

现在:

out = df2.groupby(
    pd.Grouper(freq='MS', key='timestamp')
)['timestamp'].agg([min, max, 'size'])

>>> out
                                     min                           max size
timestamp                                                                  
2021-01-01 2021-01-03 03:16:28.560081408 2021-01-30 04:29:26.376463360    7
2021-02-01 2021-02-02 07:42:49.519875328 2021-02-28 15:52:30.613407488   11
2021-03-01 2021-03-01 23:13:57.566841344 2021-03-31 10:40:45.802366208    7
2021-04-01 2021-04-06 00:56:03.035600128 2021-04-26 05:56:39.479043840    4
2021-05-01 2021-05-01 08:56:50.223057152 2021-05-25 22:46:22.658766336    8
2021-06-01 2021-06-13 13:49:34.587356672 2021-06-24 11:10:52.988078848    4
2021-07-01 2021-07-08 11:45:06.840360704 2021-07-29 23:21:16.688887296    8
2021-08-01 2021-08-01 04:06:49.102527232 2021-08-29 15:31:57.085326080    5
2021-09-01 2021-09-05 22:07:27.689974016 2021-09-24 20:43:16.493253888    8
2021-10-01 2021-10-02 06:10:33.840128512 2021-10-31 16:07:16.857132288   11
2021-11-01 2021-11-02 03:21:55.930429184 2021-11-22 22:17:35.048308992    6
2021-12-01 2021-12-03 08:45:29.833306112 2021-12-30 14:48:41.476487936    6
2022-01-01 2022-01-09 13:14:45.851782144 2022-01-31 20:31:57.571878912    5
2022-02-01 2022-02-10 18:09:40.206745088 2022-02-27 19:54:52.119116032    3
2022-03-01 2022-03-06 09:43:44.462566912 2022-03-26 17:02:32.321784320    7

附录

您真的想拥有一个经过几个月的明确循环,也可以这样做:

for month, g in df2.groupby(pd.Grouper(freq='MS', key='timestamp')):
    break

>>> month
Timestamp('2021-01-01 00:00:00', freq='MS')

>>> g['timestamp'].max()
Timestamp('2021-01-30 04:29:26.376463360')

根据评论中的请求进行编辑

,以将数据截断到特定的时间窗口:

sd = '2021-12-01'
ed = '2022-03-31'

out = df2.truncate(before=sd, after=ed).groupby(
    pd.Grouper(freq='MS', key='timestamp')
)['timestamp'].agg([min, max, 'size'])

I'm not sure why you need date_range at all. Why not simply grouping by month of df2['timestamp']?

# example setup for df2: random timestamps (not even at any particular frequency)

n = 100
s, e = '2021', '2022-04-01'
np.random.seed(0)
df2 = pd.DataFrame({
    'timestamp': np.random.uniform(pd.Timestamp(s).value, pd.Timestamp(e).value, n)
})
df2['timestamp'] = df2['timestamp'].apply(pd.Timestamp)

Now:

out = df2.groupby(
    pd.Grouper(freq='MS', key='timestamp')
)['timestamp'].agg([min, max, 'size'])

>>> out
                                     min                           max size
timestamp                                                                  
2021-01-01 2021-01-03 03:16:28.560081408 2021-01-30 04:29:26.376463360    7
2021-02-01 2021-02-02 07:42:49.519875328 2021-02-28 15:52:30.613407488   11
2021-03-01 2021-03-01 23:13:57.566841344 2021-03-31 10:40:45.802366208    7
2021-04-01 2021-04-06 00:56:03.035600128 2021-04-26 05:56:39.479043840    4
2021-05-01 2021-05-01 08:56:50.223057152 2021-05-25 22:46:22.658766336    8
2021-06-01 2021-06-13 13:49:34.587356672 2021-06-24 11:10:52.988078848    4
2021-07-01 2021-07-08 11:45:06.840360704 2021-07-29 23:21:16.688887296    8
2021-08-01 2021-08-01 04:06:49.102527232 2021-08-29 15:31:57.085326080    5
2021-09-01 2021-09-05 22:07:27.689974016 2021-09-24 20:43:16.493253888    8
2021-10-01 2021-10-02 06:10:33.840128512 2021-10-31 16:07:16.857132288   11
2021-11-01 2021-11-02 03:21:55.930429184 2021-11-22 22:17:35.048308992    6
2021-12-01 2021-12-03 08:45:29.833306112 2021-12-30 14:48:41.476487936    6
2022-01-01 2022-01-09 13:14:45.851782144 2022-01-31 20:31:57.571878912    5
2022-02-01 2022-02-10 18:09:40.206745088 2022-02-27 19:54:52.119116032    3
2022-03-01 2022-03-06 09:43:44.462566912 2022-03-26 17:02:32.321784320    7

Addendum

If you really want to have an explicit loop that goes through the months, you can do so as well:

for month, g in df2.groupby(pd.Grouper(freq='MS', key='timestamp')):
    break

>>> month
Timestamp('2021-01-01 00:00:00', freq='MS')

>>> g['timestamp'].max()
Timestamp('2021-01-30 04:29:26.376463360')

Edit

As per the request in the comment, in order to truncate the data to a specific time window:

sd = '2021-12-01'
ed = '2022-03-31'

out = df2.truncate(before=sd, after=ed).groupby(
    pd.Grouper(freq='MS', key='timestamp')
)['timestamp'].agg([min, max, 'size'])
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文