可以显示每月大熊猫的最小日期和最大日期的循环

发布于 2025-01-22 06:39:53 字数 1166 浏览 1 评论 0原文

datetimedata的映像样本...它落后于4/1/2022

import pandas as pd    
import calendar
from datetime import date

sd = '2021-12-01'
ed = '2022-03-31'
 
date_range = pd.date_range(sd,ed,
              freq='MS').map(lambda x: (x.year, x.month)).tolist()

def get_dates(year, month):
  return (date(year, month, 1).strftime("%d-%b-%Y"),
          date(year,
                month,
                calendar.monthrange(year, month)[1]
                ).strftime("%d-%b-%Y"))

[get_dates(year, month)
for year, month in date_range]
 
df2 = pd.read_csv('datetimedata.csv')
df2.loc[:,'timestamp'] = pd.to_datetime(df2.loc[:,'timestamp'])
df2 = df2.loc[(df2['timestamp'] >= sd)
                       & (df2['timestamp'] <= ed)]

要在我的控制台中获取最小值和最大值，它在下面给出了此答案：

df2.timestamp.min()
Out[102]: Timestamp('2021-12-01 00:00:00')
 
df2.timestamp.max()
Out[103]: Timestamp('2022-03-31 00:00:00')

但是我想要它，以便我要在第一次运行后停止循环（2021-12-31 00:00:00）。然后，当我将循环通过3月，然后运行代码，最大值应为2022-03-31，最小值应为2022-03-01。我只想希望它连续每个月循环，显示每个月的最小日期和最大日期。

原文

Image sample of the datetimedata... It goes down to 4/1/2022

import pandas as pd    
import calendar
from datetime import date

sd = '2021-12-01'
ed = '2022-03-31'
 
date_range = pd.date_range(sd,ed,
              freq='MS').map(lambda x: (x.year, x.month)).tolist()

def get_dates(year, month):
  return (date(year, month, 1).strftime("%d-%b-%Y"),
          date(year,
                month,
                calendar.monthrange(year, month)[1]
                ).strftime("%d-%b-%Y"))

[get_dates(year, month)
for year, month in date_range]
 
df2 = pd.read_csv('datetimedata.csv')
df2.loc[:,'timestamp'] = pd.to_datetime(df2.loc[:,'timestamp'])
df2 = df2.loc[(df2['timestamp'] >= sd)
                       & (df2['timestamp'] <= ed)]

When I write to get the min and max in my console it gives this answer below:

df2.timestamp.min()
Out[102]: Timestamp('2021-12-01 00:00:00')
 
df2.timestamp.max()
Out[103]: Timestamp('2022-03-31 00:00:00')

However I wanted it so that if I were to stop the loop after the first run the max would be (2021-12-31 00:00:00). Then when I have the loop going through March then the codes finishes running, max should be 2022-03-31 and min should be 2022-03-01. I just want it to loop through each month showing the min and max date for each consecutive month.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

羅雙樹 2025-01-29 06:39:53

我不确定您为什么完全需要date_range。为什么不简单地按月按df2 ['TIMESTAMP']进行分组？

# example setup for df2: random timestamps (not even at any particular frequency)

n = 100
s, e = '2021', '2022-04-01'
np.random.seed(0)
df2 = pd.DataFrame({
    'timestamp': np.random.uniform(pd.Timestamp(s).value, pd.Timestamp(e).value, n)
})
df2['timestamp'] = df2['timestamp'].apply(pd.Timestamp)

现在：

out = df2.groupby(
    pd.Grouper(freq='MS', key='timestamp')
)['timestamp'].agg([min, max, 'size'])

>>> out
                                     min                           max size
timestamp                                                                  
2021-01-01 2021-01-03 03:16:28.560081408 2021-01-30 04:29:26.376463360    7
2021-02-01 2021-02-02 07:42:49.519875328 2021-02-28 15:52:30.613407488   11
2021-03-01 2021-03-01 23:13:57.566841344 2021-03-31 10:40:45.802366208    7
2021-04-01 2021-04-06 00:56:03.035600128 2021-04-26 05:56:39.479043840    4
2021-05-01 2021-05-01 08:56:50.223057152 2021-05-25 22:46:22.658766336    8
2021-06-01 2021-06-13 13:49:34.587356672 2021-06-24 11:10:52.988078848    4
2021-07-01 2021-07-08 11:45:06.840360704 2021-07-29 23:21:16.688887296    8
2021-08-01 2021-08-01 04:06:49.102527232 2021-08-29 15:31:57.085326080    5
2021-09-01 2021-09-05 22:07:27.689974016 2021-09-24 20:43:16.493253888    8
2021-10-01 2021-10-02 06:10:33.840128512 2021-10-31 16:07:16.857132288   11
2021-11-01 2021-11-02 03:21:55.930429184 2021-11-22 22:17:35.048308992    6
2021-12-01 2021-12-03 08:45:29.833306112 2021-12-30 14:48:41.476487936    6
2022-01-01 2022-01-09 13:14:45.851782144 2022-01-31 20:31:57.571878912    5
2022-02-01 2022-02-10 18:09:40.206745088 2022-02-27 19:54:52.119116032    3
2022-03-01 2022-03-06 09:43:44.462566912 2022-03-26 17:02:32.321784320    7

附录

您真的想拥有一个经过几个月的明确循环，也可以这样做：

for month, g in df2.groupby(pd.Grouper(freq='MS', key='timestamp')):
    break

>>> month
Timestamp('2021-01-01 00:00:00', freq='MS')

>>> g['timestamp'].max()
Timestamp('2021-01-30 04:29:26.376463360')

根据评论中的请求进行编辑

，以将数据截断到特定的时间窗口：

sd = '2021-12-01'
ed = '2022-03-31'

out = df2.truncate(before=sd, after=ed).groupby(
    pd.Grouper(freq='MS', key='timestamp')
)['timestamp'].agg([min, max, 'size'])

I'm not sure why you need date_range at all. Why not simply grouping by month of df2['timestamp']?

# example setup for df2: random timestamps (not even at any particular frequency)

n = 100
s, e = '2021', '2022-04-01'
np.random.seed(0)
df2 = pd.DataFrame({
    'timestamp': np.random.uniform(pd.Timestamp(s).value, pd.Timestamp(e).value, n)
})
df2['timestamp'] = df2['timestamp'].apply(pd.Timestamp)

Now:

out = df2.groupby(
    pd.Grouper(freq='MS', key='timestamp')
)['timestamp'].agg([min, max, 'size'])

>>> out
                                     min                           max size
timestamp                                                                  
2021-01-01 2021-01-03 03:16:28.560081408 2021-01-30 04:29:26.376463360    7
2021-02-01 2021-02-02 07:42:49.519875328 2021-02-28 15:52:30.613407488   11
2021-03-01 2021-03-01 23:13:57.566841344 2021-03-31 10:40:45.802366208    7
2021-04-01 2021-04-06 00:56:03.035600128 2021-04-26 05:56:39.479043840    4
2021-05-01 2021-05-01 08:56:50.223057152 2021-05-25 22:46:22.658766336    8
2021-06-01 2021-06-13 13:49:34.587356672 2021-06-24 11:10:52.988078848    4
2021-07-01 2021-07-08 11:45:06.840360704 2021-07-29 23:21:16.688887296    8
2021-08-01 2021-08-01 04:06:49.102527232 2021-08-29 15:31:57.085326080    5
2021-09-01 2021-09-05 22:07:27.689974016 2021-09-24 20:43:16.493253888    8
2021-10-01 2021-10-02 06:10:33.840128512 2021-10-31 16:07:16.857132288   11
2021-11-01 2021-11-02 03:21:55.930429184 2021-11-22 22:17:35.048308992    6
2021-12-01 2021-12-03 08:45:29.833306112 2021-12-30 14:48:41.476487936    6
2022-01-01 2022-01-09 13:14:45.851782144 2022-01-31 20:31:57.571878912    5
2022-02-01 2022-02-10 18:09:40.206745088 2022-02-27 19:54:52.119116032    3
2022-03-01 2022-03-06 09:43:44.462566912 2022-03-26 17:02:32.321784320    7

Addendum

If you really want to have an explicit loop that goes through the months, you can do so as well:

for month, g in df2.groupby(pd.Grouper(freq='MS', key='timestamp')):
    break

>>> month
Timestamp('2021-01-01 00:00:00', freq='MS')

>>> g['timestamp'].max()
Timestamp('2021-01-30 04:29:26.376463360')

Edit

As per the request in the comment, in order to truncate the data to a specific time window:

sd = '2021-12-01'
ed = '2022-03-31'

out = df2.truncate(before=sd, after=ed).groupby(
    pd.Grouper(freq='MS', key='timestamp')
)['timestamp'].agg([min, max, 'size'])

回复收藏 0 原文

~没有更多了~