如何将2个数据范围合并为顺序

发布于 2025-01-23 10:59:05 字数 1061 浏览 0 评论 0 原文

我有2个dataframes

df_my ,看起来

Seq    Grp
-----------
1      1
2      1
3      2
4      1
5      2
6      2
7      3
8      3
9      1
.
.
135    3
136    1
137    1
138    2
139    3

其他数据框是df_filer

df_filer

Seq    Grp    Alg
-------------------
1      1      J
3      2      F
7      3      Z
136    1      M
137    1      R
138    2      T
139    3      Y

我想将两个dataframes合并到1中,其中

1-如果df_my seq编号已经在df_filer中,则它应该与该数字合并为df_my,基于df_my和df_filter

2-如果df_my seq编号不在df_filer中,则应与同一GRP中的最小seq一起加入,

因此最终结果应该如下,

Seq    Grp    Alg
------------------
1      1      J
2      1      J
3      2      F
4      1      J
5      2      F
6      2      F
7      3      Z
8      3      Z
9      1      J
.
.
135    3      Z
136    1      M
137    1      R
138    2      T
139    3      Y

我尝试了这一点,但是它不会给我预期的结果,

df_Out = df_My.merge(df_Filter, axis=1), on='Grp')

任何想法如何实现这一目标?

I have 2 dataframes

df_My which looks like this

Seq    Grp
-----------
1      1
2      1
3      2
4      1
5      2
6      2
7      3
8      3
9      1
.
.
135    3
136    1
137    1
138    2
139    3

The other dataframe is df_Filer

df_Filer

Seq    Grp    Alg
-------------------
1      1      J
3      2      F
7      3      Z
136    1      M
137    1      R
138    2      T
139    3      Y

I want to merge both dataframes into 1 where

1- if df_My Seq number has is already in df_Filer then it should merge with that number is df_My based on Seq in both df_My and df_Filter

2- if df_My Seq number is not in df_Filer then it should join with the smallest Seq in the same Grp

so the end result should be as follow

Seq    Grp    Alg
------------------
1      1      J
2      1      J
3      2      F
4      1      J
5      2      F
6      2      F
7      3      Z
8      3      Z
9      1      J
.
.
135    3      Z
136    1      M
137    1      R
138    2      T
139    3      Y

I tried this but it does not give me the expected result

df_Out = df_My.merge(df_Filter, axis=1), on='Grp')

Any idea how to achieve this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

怕倦 2025-01-30 10:59:05

merge_asof :

out = pd.merge_asof(df1, df2, on='Seq', by='Grp')

另一个选项是,您可以 left-merge ,然后使用 groupby + idxmin 创建一个从GRP到ALG的映射丢失值,然后填写:

df_merged = df1.merge(df2, on='Seq', how='left', suffixes=('','_')).drop(columns=['Grp_'])
no_nan = df_merged.dropna()
mapping = df_merged['Grp'].map(no_nan.loc[no_nan.groupby('Grp')['Seq'].idxmin(), ['Grp','Alg']].set_index('Grp')['Alg'])
df_merged['Alg'] = df_merged['Alg'].fillna(mapping)

输出:

    Seq  Grp Alg
0     1    1   J
1     2    1   J
2     3    2   F
3     4    1   J
4     5    2   F
5     6    2   F
6     7    3   Z
7     8    3   Z
8     9    1   J
9   135    3   Z
10  136    1   M
11  137    1   R
12  138    2   T
13  139    3   Y

As @sammywemmy suggested, you could use merge_asof:

out = pd.merge_asof(df1, df2, on='Seq', by='Grp')

Another options is, you can left-merge, then use groupby + idxmin to create a mapping from Grp to Alg for the missing values, then fill:

df_merged = df1.merge(df2, on='Seq', how='left', suffixes=('','_')).drop(columns=['Grp_'])
no_nan = df_merged.dropna()
mapping = df_merged['Grp'].map(no_nan.loc[no_nan.groupby('Grp')['Seq'].idxmin(), ['Grp','Alg']].set_index('Grp')['Alg'])
df_merged['Alg'] = df_merged['Alg'].fillna(mapping)

Output:

    Seq  Grp Alg
0     1    1   J
1     2    1   J
2     3    2   F
3     4    1   J
4     5    2   F
5     6    2   F
6     7    3   Z
7     8    3   Z
8     9    1   J
9   135    3   Z
10  136    1   M
11  137    1   R
12  138    2   T
13  139    3   Y
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文