不寻常的数据帧Groupby

发布于 2025-02-10 22:24:22 字数 1488 浏览 1 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

遮了一弯 2025-02-17 22:24:22

您可以复制重叠的行以将其分配给2组:

dup = d.dt.year.ne(d.shift().dt.year).shift(-1, fill_value=False).add(1)
df1 = df.reindex(df.index.repeat(dup))

gid = df1.index.duplicated(keep='first').cumsum() + 1
out = dict(list(df1.assign(group=gid).groupby(gid, as_index=False)))

输出:

>>> out
{1:          date    value  group
 0  2017-03-31  1163.00      1
 1  2017-04-03  1221.15      1
 2  2017-12-27  1318.84      1
 3  2017-12-28  1384.78      1
 4  2017-12-29  1523.26      1,
 2:          date    value  group
 4  2017-12-29  1523.26      2
 5  2018-01-02  1660.36      2
 6  2018-12-31  1710.17      2,
 3:          date    value  group
 6  2018-12-31  1710.17      3
 7  2019-01-02  1881.18      3
 8  2019-01-03  1956.43      3
 9  2019-12-31  2015.12      3,
 4:           date    value  group
 9   2019-12-31  2015.12      4
 10  2020-12-30  2216.64      4
 11  2020-12-31  2349.63      4,
 5:           date    value  group
 11  2020-12-31  2349.63      5
 12  2021-01-20  2373.13      5
 13  2021-12-30  2562.98      5
 14  2021-12-31  2819.28      5,
 6:           date    value  group
 14  2021-12-31  2819.28      6
 15  2022-05-30  2875.66      6
 16  2022-05-31  2904.42      6}

Update

您是否正在寻找:

dup = d.dt.year.ne(d.shift().dt.year).shift(-1, fill_value=False).add(1)
df1 = df.reindex(df.index.repeat(dup))

gid = df1.index.duplicated(keep='first').cumsum() + 1
df1 = df1.assign(group=gid).reset_index(drop=True)

输出:

>>> df1
          date    value  group
0   2017-03-31  1163.00      1
1   2017-04-03  1221.15      1
2   2017-12-27  1318.84      1
3   2017-12-28  1384.78      1
4   2017-12-29  1523.26      1
5   2017-12-29  1523.26      2
6   2018-01-02  1660.36      2
7   2018-12-31  1710.17      2
8   2018-12-31  1710.17      3
9   2019-01-02  1881.18      3
10  2019-01-03  1956.43      3
11  2019-12-31  2015.12      3
12  2019-12-31  2015.12      4
13  2020-12-30  2216.64      4
14  2020-12-31  2349.63      4
15  2020-12-31  2349.63      5
16  2021-01-20  2373.13      5
17  2021-12-30  2562.98      5
18  2021-12-31  2819.28      5
19  2021-12-31  2819.28      6
20  2022-05-30  2875.66      6
21  2022-05-31  2904.42      6

You can duplicate the overlapped row to assign it to 2 groups:

dup = d.dt.year.ne(d.shift().dt.year).shift(-1, fill_value=False).add(1)
df1 = df.reindex(df.index.repeat(dup))

gid = df1.index.duplicated(keep='first').cumsum() + 1
out = dict(list(df1.assign(group=gid).groupby(gid, as_index=False)))

Output:

>>> out
{1:          date    value  group
 0  2017-03-31  1163.00      1
 1  2017-04-03  1221.15      1
 2  2017-12-27  1318.84      1
 3  2017-12-28  1384.78      1
 4  2017-12-29  1523.26      1,
 2:          date    value  group
 4  2017-12-29  1523.26      2
 5  2018-01-02  1660.36      2
 6  2018-12-31  1710.17      2,
 3:          date    value  group
 6  2018-12-31  1710.17      3
 7  2019-01-02  1881.18      3
 8  2019-01-03  1956.43      3
 9  2019-12-31  2015.12      3,
 4:           date    value  group
 9   2019-12-31  2015.12      4
 10  2020-12-30  2216.64      4
 11  2020-12-31  2349.63      4,
 5:           date    value  group
 11  2020-12-31  2349.63      5
 12  2021-01-20  2373.13      5
 13  2021-12-30  2562.98      5
 14  2021-12-31  2819.28      5,
 6:           date    value  group
 14  2021-12-31  2819.28      6
 15  2022-05-30  2875.66      6
 16  2022-05-31  2904.42      6}

Update

Are you looking for:

dup = d.dt.year.ne(d.shift().dt.year).shift(-1, fill_value=False).add(1)
df1 = df.reindex(df.index.repeat(dup))

gid = df1.index.duplicated(keep='first').cumsum() + 1
df1 = df1.assign(group=gid).reset_index(drop=True)

Output:

>>> df1
          date    value  group
0   2017-03-31  1163.00      1
1   2017-04-03  1221.15      1
2   2017-12-27  1318.84      1
3   2017-12-28  1384.78      1
4   2017-12-29  1523.26      1
5   2017-12-29  1523.26      2
6   2018-01-02  1660.36      2
7   2018-12-31  1710.17      2
8   2018-12-31  1710.17      3
9   2019-01-02  1881.18      3
10  2019-01-03  1956.43      3
11  2019-12-31  2015.12      3
12  2019-12-31  2015.12      4
13  2020-12-30  2216.64      4
14  2020-12-31  2349.63      4
15  2020-12-31  2349.63      5
16  2021-01-20  2373.13      5
17  2021-12-30  2562.98      5
18  2021-12-31  2819.28      5
19  2021-12-31  2819.28      6
20  2022-05-30  2875.66      6
21  2022-05-31  2904.42      6
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文