如何循环浏览具有特殊条件的多个列表的每个项目?

发布于 2025-02-09 07:52:00 字数 2258 浏览 2 评论 0原文

我有2个数据范围如下:

DF1:

df1 = pd.DataFrame({'feature1':['a1','a1','a1','b1','b1','b1'], 'value': [1,2,3,4,5,6]})
df1

DF2:

df2 = pd.DataFrame({'feature1':['c1','c1','c1','c2','c2','c2'], 'value2': [1,2,3,1,2,3]})
df2

我的目标是产生此结果:

  • 哪个A1循环使用C1;用C2循环
| feature1 | value | feature2 | value2|
| -------- | ----- | -------- | ----- |
| a1       | 1     | c1       | 1     |
| a1       | 1     | c1       | 2     |
| a1       | 1     | c1       | 3     |
| a1       | 2     | c1       | 1     |
| a1       | 2     | c1       | 2     |
| a1       | 2     | c1       | 3     |
| a1       | 3     | c1       | 1     |
| a1       | 3     | c1       | 2     |
| a1       | 3     | c1       | 3     |
| b1       | 4     | c2       | 1     |
| b1       | 4     | c2       | 2     |
| b1       | 4     | c2       | 3     |
| b1       | 5     | c2       | 1     |
| b1       | 5     | c2       | 2     |
| b1       | 5     | c2       | 3     |
| b1       | 6     | c2       | 1     |
| b1       | 6     | c2       | 2     |
| b1       | 6     | c2       | 3     |

我所做的如下:

  1. 转换值& value 2 to 2列表:
list1 = df1[df1.columns[1]].values.tolist()
list1

output: [1, 2, 3, 4, 5, 6]
list2 = df2[df2.columns[1]].values.tolist()
list2

output: [1, 2, 3, 1, 2, 3]
  1. 使用列表理解进行多插波迭代:
lim1, lim2 = [], []

for x, y in [(x,y) for x in list1 for y in list2]:
    #print(x, y, z)

    lim1.append(x)
    lim2.append(y)

    df_limit = pd.DataFrame({
        "value": lim1, 
        "value2": lim2,
    })

结果循环整个列,而不是我需要的内容:


value   value2
0   1   1
1   1   2
2   1   3
3   1   1
4   1   2
5   1   3
6   2   1
7   2   2
8   2   3
9   2   1
10  2   2
11  2   3
12  3   1
13  3   2
14  3   3
15  3   1
16  3   2
17  3   3
18  4   1
19  4   2
20  4   3
21  4   1
22  4   2
23  4   3
24  5   1
25  5   2
26  5   3
27  5   1
28  5   2
29  5   3
30  6   1
31  6   2
32  6   3
33  6   1
34  6   2
35  6   3

我试图弄清楚是否将df.groupby()用于功能并做列表理解会有所帮助,但是到目前为止我无法进行...

现实生活中的例子比这要复杂得多,因为有100多个组合,因此要寻求一种更具意义的方式来做到这一点。

I have 2 dataframes as below:

df1:

df1 = pd.DataFrame({'feature1':['a1','a1','a1','b1','b1','b1'], 'value': [1,2,3,4,5,6]})
df1

df2:

df2 = pd.DataFrame({'feature1':['c1','c1','c1','c2','c2','c2'], 'value2': [1,2,3,1,2,3]})
df2

My goal is to yield this result:

  • Which a1 loops with c1 ; b1 loops with c2
| feature1 | value | feature2 | value2|
| -------- | ----- | -------- | ----- |
| a1       | 1     | c1       | 1     |
| a1       | 1     | c1       | 2     |
| a1       | 1     | c1       | 3     |
| a1       | 2     | c1       | 1     |
| a1       | 2     | c1       | 2     |
| a1       | 2     | c1       | 3     |
| a1       | 3     | c1       | 1     |
| a1       | 3     | c1       | 2     |
| a1       | 3     | c1       | 3     |
| b1       | 4     | c2       | 1     |
| b1       | 4     | c2       | 2     |
| b1       | 4     | c2       | 3     |
| b1       | 5     | c2       | 1     |
| b1       | 5     | c2       | 2     |
| b1       | 5     | c2       | 3     |
| b1       | 6     | c2       | 1     |
| b1       | 6     | c2       | 2     |
| b1       | 6     | c2       | 3     |

What I have done is as below:

  1. Convert the value & value2 into 2 lists:
list1 = df1[df1.columns[1]].values.tolist()
list1

output: [1, 2, 3, 4, 5, 6]
list2 = df2[df2.columns[1]].values.tolist()
list2

output: [1, 2, 3, 1, 2, 3]
  1. Do a multiloop iteration using list comprehension:
lim1, lim2 = [], []

for x, y in [(x,y) for x in list1 for y in list2]:
    #print(x, y, z)

    lim1.append(x)
    lim2.append(y)

    df_limit = pd.DataFrame({
        "value": lim1, 
        "value2": lim2,
    })

The result loops entire columns instead of what I need:


value   value2
0   1   1
1   1   2
2   1   3
3   1   1
4   1   2
5   1   3
6   2   1
7   2   2
8   2   3
9   2   1
10  2   2
11  2   3
12  3   1
13  3   2
14  3   3
15  3   1
16  3   2
17  3   3
18  4   1
19  4   2
20  4   3
21  4   1
22  4   2
23  4   3
24  5   1
25  5   2
26  5   3
27  5   1
28  5   2
29  5   3
30  6   1
31  6   2
32  6   3
33  6   1
34  6   2
35  6   3

I am trying to figure out if use df.groupby() for the features and do list comprehension would help but so far I am unable to proceed...

The real life example is much more complicated than this as there more than 100 of combinations, so would to seek a more iterable way to do so.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

黑白记忆 2025-02-16 07:52:00

循环基本上是从不在熊猫方面的答案。

交叉加入所有事物后的过滤:

import pandas as pd 
df1 = pd.DataFrame({'feature':['a1','a1','a1','b1','b1','b1'], 'value': [1,2,3,4,5,6]})
df2 = pd.DataFrame({'feature':['c1','c1','c1','c2','c2','c2'], 'value': [1,2,3,1,2,3]})
df = df1.merge(df2, 'cross', suffixes=['1', '2'])
out = df[df.feature1.eq('a1') & df.feature2.eq('c1') | df.feature1.eq('b1') & df.feature2.eq('c2')].reset_index(drop=True)
print(out)

输出:

   feature1  value1 feature2  value2
0        a1       1       c1       1
1        a1       1       c1       2
2        a1       1       c1       3
3        a1       2       c1       1
4        a1       2       c1       2
5        a1       2       c1       3
6        a1       3       c1       1
7        a1       3       c1       2
8        a1       3       c1       3
9        b1       4       c2       1
10       b1       4       c2       2
11       b1       4       c2       3
12       b1       5       c2       1
13       b1       5       c2       2
14       b1       5       c2       3
15       b1       6       c2       1
16       b1       6       c2       2
17       b1       6       c2       3

交叉加入前过滤:

a1_c1 = [df1[df1.feature.eq('a1')], df2[df2.feature.eq('c1')]]
b1_c2 = [df1[df1.feature.eq('b1')], df2[df2.feature.eq('c2')]]
dfs = []
for pair in [a1_c1, b1_c2]:
    temp_df = pd.merge(*pair, how='cross', suffixes=['1','2'])
    dfs.append(temp_df)
df = pd.concat(dfs, ignore_index=True)
print(df)

输出:

   feature1  value1 feature2  value2
0        a1       1       c1       1
1        a1       1       c1       2
2        a1       1       c1       3
3        a1       2       c1       1
4        a1       2       c1       2
5        a1       2       c1       3
6        a1       3       c1       1
7        a1       3       c1       2
8        a1       3       c1       3
9        b1       4       c2       1
10       b1       4       c2       2
11       b1       4       c2       3
12       b1       5       c2       1
13       b1       5       c2       2
14       b1       5       c2       3
15       b1       6       c2       1
16       b1       6       c2       2
17       b1       6       c2       3

Loops are basically never the answer when it comes to pandas.

Filtering after cross joining everything:

import pandas as pd 
df1 = pd.DataFrame({'feature':['a1','a1','a1','b1','b1','b1'], 'value': [1,2,3,4,5,6]})
df2 = pd.DataFrame({'feature':['c1','c1','c1','c2','c2','c2'], 'value': [1,2,3,1,2,3]})
df = df1.merge(df2, 'cross', suffixes=['1', '2'])
out = df[df.feature1.eq('a1') & df.feature2.eq('c1') | df.feature1.eq('b1') & df.feature2.eq('c2')].reset_index(drop=True)
print(out)

Output:

   feature1  value1 feature2  value2
0        a1       1       c1       1
1        a1       1       c1       2
2        a1       1       c1       3
3        a1       2       c1       1
4        a1       2       c1       2
5        a1       2       c1       3
6        a1       3       c1       1
7        a1       3       c1       2
8        a1       3       c1       3
9        b1       4       c2       1
10       b1       4       c2       2
11       b1       4       c2       3
12       b1       5       c2       1
13       b1       5       c2       2
14       b1       5       c2       3
15       b1       6       c2       1
16       b1       6       c2       2
17       b1       6       c2       3

Filtering before cross joining:

a1_c1 = [df1[df1.feature.eq('a1')], df2[df2.feature.eq('c1')]]
b1_c2 = [df1[df1.feature.eq('b1')], df2[df2.feature.eq('c2')]]
dfs = []
for pair in [a1_c1, b1_c2]:
    temp_df = pd.merge(*pair, how='cross', suffixes=['1','2'])
    dfs.append(temp_df)
df = pd.concat(dfs, ignore_index=True)
print(df)

Output:

   feature1  value1 feature2  value2
0        a1       1       c1       1
1        a1       1       c1       2
2        a1       1       c1       3
3        a1       2       c1       1
4        a1       2       c1       2
5        a1       2       c1       3
6        a1       3       c1       1
7        a1       3       c1       2
8        a1       3       c1       3
9        b1       4       c2       1
10       b1       4       c2       2
11       b1       4       c2       3
12       b1       5       c2       1
13       b1       5       c2       2
14       b1       5       c2       3
15       b1       6       c2       1
16       b1       6       c2       2
17       b1       6       c2       3
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文