xarray - 将数据重新采样为可变长度的季节性频率

发布于 2025-01-09 17:18:59 字数 204 浏览 2 评论 0原文

我有每月时间序列的数据,我想以特定的方式对其进行重新采样,这样我就有两个季节组 - 5 个月 (DJFMA) 和 7 个月 (MJJASON),并找到每个组中每个网格点的最大值。这是我所拥有的,但显然它没有达到我想要的效果:

my_data.resample(time='2QS-NOV').max(dim='time')

谢谢!

I have data that is on a monthly timeseries, and I want to resample it in a specific way such that I have two seasonal groups - 5 months (DJFMA) and 7 months (MJJASON) and find the maximum value for each gridpoint from each group. Here is what I have, but obviously it does not do what I want:

my_data.resample(time='2QS-NOV').max(dim='time')

Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

迷爱 2025-01-16 17:18:59

您可以使用 DateTimeAccessor 任何日期时间坐标的属性来定义您自己的石斑鱼,然后使用 groupby 而不是重新采样来使用自定义重新采样频率。

例如,我将设置一个包含 4 年日常数据的虚拟数据集。

In [1]: import pandas as pd, xarray as xr, numpy as np

In [2]: da = xr.DataArray(
   ...:     np.arange(365 * 4 + 1),
   ...:     dims=["time"],
   ...:     coords=[pd.date_range("2020-01-01", freq="D", periods=(365 * 4 + 1))],
   ...: )

In [3]: da
Out[3]:
<xarray.DataArray (time: 1461)>
array([   0,    1,    2, ..., 1458, 1459, 1460])
Coordinates:
  * time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2023-12-31

您可以使用 .dt.month 访问器以 1 到 12 之间的整数形式访问月份:

In [4]: da.time.dt.month
Out[4]:
<xarray.DataArray 'month' (time: 1461)>
array([ 1,  1,  1, ..., 12, 12, 12])
Coordinates:
  * time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2023-12-31

您可以将其用作它自己的系列来构建您想要的任何条件:

In [5]: (da.time.dt.month > 4) & (da.time.dt.month < 12)
Out[5]:
<xarray.DataArray 'month' (time: 1461)>
array([False, False, False, ..., False, False, False])
Coordinates:
  * time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2023-12-31

我将在此基础上制作格式为 YYYY-{monthgroup} 的字符串,确保将 12 月包含在下一年的组中:

In [13]: grouper = (
    ...:     xr.where(da.time.dt.month == 12, (da.time.dt.year + 1), da.time.dt.year)
    ...:     .astype(str)
    ...:     .astype("O")
    ...:     + "-"
    ...:     + xr.where((da.time.dt.month > 4) & (da.time.dt.month < 12), "MJJASON", "DJFMA")
    ...: )

我们可以使用此石斑鱼对数据进行重新采样时间维度:

In [14]: da.groupby(grouper).max(dim="time").sortby("group")
Out[14]:
<xarray.DataArray (group: 9)>
array([ 120,  334,  485,  699,  850, 1064, 1215, 1429, 1460])
Coordinates:
  * group    (group) object '2020-DJFMA' '2020-MJJASON' ... '2024-DJFMA'

请注意,第一组和最后一组缺少月份,因为数据与 12 月到 11 月的季节性方案不完全一致。您可能想根据您的目标放弃这些。

you can use the DateTimeAccessor attribute of any datetime coordinate to define your own grouper, then use groupby instead of resample to work with custom resampling frequencies.

As an example, I'll set up a dummy dataset with 4 years of daily data

In [1]: import pandas as pd, xarray as xr, numpy as np

In [2]: da = xr.DataArray(
   ...:     np.arange(365 * 4 + 1),
   ...:     dims=["time"],
   ...:     coords=[pd.date_range("2020-01-01", freq="D", periods=(365 * 4 + 1))],
   ...: )

In [3]: da
Out[3]:
<xarray.DataArray (time: 1461)>
array([   0,    1,    2, ..., 1458, 1459, 1460])
Coordinates:
  * time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2023-12-31

You can access the month as an integer from 1 to 12 using the .dt.month accessor:

In [4]: da.time.dt.month
Out[4]:
<xarray.DataArray 'month' (time: 1461)>
array([ 1,  1,  1, ..., 12, 12, 12])
Coordinates:
  * time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2023-12-31

You can use this as its own series to build any conditions you want:

In [5]: (da.time.dt.month > 4) & (da.time.dt.month < 12)
Out[5]:
<xarray.DataArray 'month' (time: 1461)>
array([False, False, False, ..., False, False, False])
Coordinates:
  * time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2023-12-31

I'll build on this to make a string of the format YYYY-{monthgroup}, making sure to include December in the next year's group:

In [13]: grouper = (
    ...:     xr.where(da.time.dt.month == 12, (da.time.dt.year + 1), da.time.dt.year)
    ...:     .astype(str)
    ...:     .astype("O")
    ...:     + "-"
    ...:     + xr.where((da.time.dt.month > 4) & (da.time.dt.month < 12), "MJJASON", "DJFMA")
    ...: )

We can use this grouper to resample the data along the time dimension:

In [14]: da.groupby(grouper).max(dim="time").sortby("group")
Out[14]:
<xarray.DataArray (group: 9)>
array([ 120,  334,  485,  699,  850, 1064, 1215, 1429, 1460])
Coordinates:
  * group    (group) object '2020-DJFMA' '2020-MJJASON' ... '2024-DJFMA'

Note that the first and last groups are missing months because the data doesn't align cleanly with the December through November seasonal scheme. You may want to drop these depending on your goals.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文