当前位置：文江博客话题详情

添加具有部分重叠索引的 DataFrame

发布于 2025-01-10 06:14:40 字数 1707 浏览 0 评论 0原文

我想将两个 Pandas DataFrame 添加在一起，其中包含日期时间索引和一组公共列。

日期时间索引将具有 95% 的公共值，但 df2 中的某些行可能不在 df1 中，反之亦然。

我想将两个 DataFrame 添加在一起，当其中一个 DataFrame 没有索引时，另一个 DataFrame 仅将其视为 0 （或采用带有值的那个，以更好者为准）。

结果应该不删除任何索引，即类似于外部连接的东西，而不是内部连接。

我尝试过 pd.add，但似乎会丢弃 NaN 结果，其中两个 DataFrame 都没有条目。

pd.concat 在它们没有公共索引的地方工作，但在它们有公共索引的地方我得到重复而不是相加。我是否必须执行第二个 groupby sum 步骤？我认为有一种更简单的方法可以做到这一点。

例如：

FRAME 1

月份	Val 1	Val 2
2022-01-01	1	2
2022-02-01	3	4
2022-03-01	5	6

FRAME 2

月份	Val 1	瓦尔2
2022-03-01	101	102
2022-04-01	103	104
2024-01-01	105	106
2025-01-01	107	108

期望结果

月份	Val 1	Val 2
2022-01-01	1	2
2022-02-01	3	4
2022-03-01	106	108
2022年4月1日	103	104
2024年1月1日	105	106
2025年1月1日	107	108

原文

I have two Pandas DataFrames I'd like to add together, with a datetime index, and a set of common columns.

The datetime indices will have 95% common values, but some of the rows in df2 may not be in df1 and vice versa.

I'd like to add the two DataFrames together, and when one of the DataFrames do not have the index the other does just treat is as 0 (or take the one with a value, whichever is better).

The result should not drop any indices, i.e. something like an outer join, rather than an inner.

I have tried pd.add, but that appears to drop NaN results where both DataFrames do not have an entry.

pd.concat works where they don't have common indices, but where they do I get duplicates instead of adding together. Do I have to do a second groupby sum step? I thought there'd be a simpler way to do this.

For example:

FRAME 1

Month	Val 1	Val 2
2022-01-01	1	2
2022-02-01	3	4
2022-03-01	5	6

FRAME 2

Month	Val 1	Val 2
2022-03-01	101	102
2022-04-01	103	104
2024-01-01	105	106
2025-01-01	107	108

DESIRED RESULT

Month	Val 1	Val 2
2022-01-01	1	2
2022-02-01	3	4
2022-03-01	106	108
2022-04-01	103	104
2024-01-01	105	106
2025-01-01	107	108

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我ぃ本無心為│何有愛 2025-01-17 06:14:40

希望这对你有用:)

pd.concat([df1,df2]).groupby(["Month"]).sum().reset_index()

输出

    Month   Val1    Val2
0   2022-01-01  1   2
1   2022-02-01  3   4
2   2022-03-01  106 108
3   2022-04-01  103 104
4   2024-01-01  105 106
5   2025-01-01  107 108

Hope this works for you :)

pd.concat([df1,df2]).groupby(["Month"]).sum().reset_index()

Output

    Month   Val1    Val2
0   2022-01-01  1   2
1   2022-02-01  3   4
2   2022-03-01  106 108
3   2022-04-01  103 104
4   2024-01-01  105 106
5   2025-01-01  107 108

回复收藏 0 原文

葬花如无物 2025-01-17 06:14:40

假设“Month”是索引（如果不是先 set_index('Month')），我们可以使用 reindex 每个 DataFrame 与索引的并集，fillna 和 add：

idx = df1.index.union(df2.index)
out = df1.reindex(idx).fillna(0).add(df2.reindex(idx).fillna(0)).astype(int)

输出：

            Val 1  Val 2
Month                   
2022-01-01      1      2
2022-02-01      3      4
2022-03-01    106    108
2022-04-01    103    104
2024-01-01    105    106
2025-01-01    107    108

Assuming "Month"s are indexes (if not set_index('Month') first), we could use reindex each DataFrame with the union of the indexes, fillna and add:

idx = df1.index.union(df2.index)
out = df1.reindex(idx).fillna(0).add(df2.reindex(idx).fillna(0)).astype(int)

Output:

            Val 1  Val 2
Month                   
2022-01-01      1      2
2022-02-01      3      4
2022-03-01    106    108
2022-04-01    103    104
2024-01-01    105    106
2025-01-01    107    108

回复收藏 0 原文

~没有更多了~