如何将2个数据帧与不同键合并

发布于 2025-02-03 07:30:51 字数 996 浏览 5 评论 0原文

您能在下面提供我的案例吗?我想合并2个数据集,以获取以下预期结果:

DF 1:一个数据框总结每个ID的功能的值如下:

ID功能1功能2
11.23.4
22.31.2 3 1.2
33.56

DF 2:将表映射到根据每个功能的最小/最大阈值确定每个功能的相应段 |功能| min | max |细分| | - | --- | --- | --- | |功能1 | 0 | 1 | 1 | |功能1 | 1 | 2 | 2 | |功能1 | 2 | Inf | 3 | |功能2 | 0 | 4 | 1 | |功能2 | 4 | 5 | 2 | |功能| 5 | Inf | 3 |

预期结果:我想将DF1与df2中的映射表合并以获得相应的段

ID功能1功能1功能1特征1片段功能2段
11.23.421
22.31.2 3 1.231
33.53 3.5 6 3 33 33

感谢很多帮助。

Can you please help on my case as below. I want to merge 2 dataset to get the expected results as below:

Df 1: A data frame summarises value of features of each ID as below:

idfeature 1feature 2
11.23.4
22.31.2
33.56

Df 2: Mapping tables to determine corresponding segment for each feature based on min/max thresholds of each features
|Feature|Min |Max|segment|
|--| --- | --- |---|
|Feature 1 | 0 |1|1|
|Feature 1 |1|2|2|
|Feature 1 |2 |inf|3|
|Feature 2 | 0 |4|1|
|Feature 2 |4|5|2|
|Feature |5 |inf|3|

Expected results: I want to merge df1 with mapping table in df2 to get corresponding segment

idfeature 1feature 2feature 1 segmentfeature 2 segment
11.23.421
22.31.231
33.5633

Thanks a lot for helps

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

久随 2025-02-10 07:30:51
import pandas as pd
import numpy as np

df1 = pd.DataFrame({'id': [1, 2, 3], 'feature 1': [1.2, 2.3, 3.5], 'feature 2': [3.4, 1.2, 6]})

df2 = pd.DataFrame({'Feature': ['feature 1', 'feature 1', 'feature 1', 'feature 2', 'feature 2', 'feature 2'],
                    'Min': [0, 1, 2, 0, 4, 5], 'Max': [1, 2, np.inf, 4, 5, np.inf], 'segment': [1, 2, 3, 1, 2, 3]})


df1 = df1.set_index('id')

def func_data(x, q):
    df = df2[df2['Feature'] == q]
    ttt = np.where((x.values[0] >= df['Min']) & (x.values[0] <= df['Max']))
    index = df1.index[ttt][0]

    return index


df1['feature_1_segment'] = df1.groupby(['id'])['feature 1'].apply(func_data, 'feature 1')
df1['feature_2_segment'] = df1.groupby(['id'])['feature 2'].apply(func_data, 'feature 2')
df1 = df1.reset_index()

print(df1)

在这里输出

   id  feature 1  feature 2  feature_1_segment  feature_2_segment
0   1        1.2        3.4                  2                  1
1   2        2.3        1.2                  3                  1
2   3        3.5        6.0                  3                  3

,首先将“ ID”列设置为索引。创建一个“ func_data”功能来确定每个数字落入哪个范围。 np.的函数来自numpy的范围。通过“ ID”获取索引。

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'id': [1, 2, 3], 'feature 1': [1.2, 2.3, 3.5], 'feature 2': [3.4, 1.2, 6]})

df2 = pd.DataFrame({'Feature': ['feature 1', 'feature 1', 'feature 1', 'feature 2', 'feature 2', 'feature 2'],
                    'Min': [0, 1, 2, 0, 4, 5], 'Max': [1, 2, np.inf, 4, 5, np.inf], 'segment': [1, 2, 3, 1, 2, 3]})


df1 = df1.set_index('id')

def func_data(x, q):
    df = df2[df2['Feature'] == q]
    ttt = np.where((x.values[0] >= df['Min']) & (x.values[0] <= df['Max']))
    index = df1.index[ttt][0]

    return index


df1['feature_1_segment'] = df1.groupby(['id'])['feature 1'].apply(func_data, 'feature 1')
df1['feature_2_segment'] = df1.groupby(['id'])['feature 2'].apply(func_data, 'feature 2')
df1 = df1.reset_index()

print(df1)

Output

   id  feature 1  feature 2  feature_1_segment  feature_2_segment
0   1        1.2        3.4                  2                  1
1   2        2.3        1.2                  3                  1
2   3        3.5        6.0                  3                  3

Here, first the 'id' column is set as an index. A 'func_data ' function is created to determine which range each number falls into. The np.where function from numpy is used to test for a range. Get indexes by 'id'.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文