如何在数据框架中跨成对行并删除熊猫中的非交流元素
我有此数据框:
import pandas as pd
data = {'small group': [['a1', 'a2'], ['a2', 'a3'],['a3','a4'], ['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4'], ['c1', 'c2'], ['c2', 'c3'], ['f1', 'f2']],
'all_groups': [[['a1', 'a2'], ['a2', 'a3'],['a3','a4']], [['a1', 'a2'], ['a2', 'a3'],['a3','a4']],
[['a1', 'a2'], ['a2', 'a3'],['a3','a4']],
[['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']], [['d1', 'd2'], ['d2', 'd3'],
['d3', 'd4']], [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']], [['c1', 'c2'], ['c2', 'c3']],
[['c1', 'c2'], ['c2', 'c3']], [['f1', 'f2']]],
'name':[['Alina','Kate'],['Alina','Kate','Diana'],['Kate','Diana'],['Mike','Bob'],['Ian','Lili'],['George','Cloud','Ian','Petro'],['Jone','Petro','Lili'],['Marinet','Yu','Chloe','Ian'],['Nate','Rose']],
'day': [28, 28,28, 18, 18, 18, 20, 20, 3],
}
df = pd.DataFrame(data)
输出:
+----+---------------+--------------------------------------------+-------------------------------------+-------+
| | small group | all_groups | name | day |
|----+---------------+--------------------------------------------+-------------------------------------+-------|
| 0 | ['a1', 'a2'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] | ['Alina', 'Kate'] | 28 |
| 1 | ['a2', 'a3'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] | ['Alina', 'Kate', 'Diana'] | 28 |
| 2 | ['a3', 'a4'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] | ['Kate', 'Diana'] | 28 |
| 3 | ['d1', 'd2'] | [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']] | ['Mike', 'Bob'] | 18 |
| 4 | ['d2', 'd3'] | [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']] | ['Ian', 'Lili'] | 18 |
| 5 | ['d3', 'd4'] | [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']] | ['George', 'Cloud', 'Ian', 'Petro'] | 18 |
| 6 | ['c1', 'c2'] | [['c1', 'c2'], ['c2', 'c3']] | ['Jone', 'Petro', 'Lili'] | 20 |
| 7 | ['c2', 'c3'] | [['c1', 'c2'], ['c2', 'c3']] | ['Marinet', 'Yu', 'Chloe', 'Ian'] | 20 |
| 8 | ['f1', 'f2'] | [['f1', 'f2']] | ['Nate', 'Rose'] | 3 |
+----+---------------+--------------------------------------------+-------------------------------------+-------+
我需要在分组时将列与成对的名称相交。我需要与所有可能的组合相交,而不仅仅是按顺序结合。我了解如何为所有列表进行此操作,但我不明白如何成对输出。我想使用一个称为相交和聚合的函数。但是也许还有另一种方式
df_new = (df.groupby(['day','all_groups'])
.agg({'name': intersect})).reset_index()
+----+-------+------------------------------+-------------------+--------------------------------------------+
| | day | pair | name_intersect | all_groups |
|----+-------+------------------------------+-------------------+--------------------------------------------|
| 0 | 28 | [['a1', 'a2'], ['a2', 'a3']] | ['Alina', 'Kate'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] |
| 1 | 28 | [['a1', 'a2'], ['a3', 'a4']] | ['Kate'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] |
| 2 | 28 | [['a2', 'a3'], ['a3', 'a4']] | ['Kate', 'Diana'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] |
| 3 | 18 | [['d1', 'd2'], ['d2', 'd3']] | [] | [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']] |
| 4 | 18 | [['d1', 'd2'], ['d3', 'd4']] | [] | [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']] |
| 5 | 18 | [['d2', 'd3'], ['d3', 'd4']] | ['Ian'] | [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']] |
| 6 | 20 | [['c1', 'c2'], ['c2', 'c3']] | [] | [['c1', 'c2'], ['c2', 'c3']] |
| 7 | 3 | ['f1', 'f2'] | [] | [['f1', 'f2']] |
+----+-------+------------------------------+-------------------+--------------------------------------------+
,然后我想在name_intersect
列中删除空行。并考虑对['d1','d2']
等案例 [['d1','d2'],['d2','d3'], ['d3','d4']] ,因为使用此对,没有与其他人的交集(我想从此列表中删除这对)。因此,我想获得这样的输出:
+----+-------+------------------------------+-------------------+--------------------------------------------+
| | day | pair | name_intersect | all_groups |
|----+-------+------------------------------+-------------------+--------------------------------------------|
| 0 | 28 | [['a1', 'a2'], ['a2', 'a3']] | ['Alina', 'Kate'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] |
| 1 | 28 | [['a1', 'a2'], ['a3', 'a4']] | ['Kate'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] |
| 2 | 28 | [['a2', 'a3'], ['a3', 'a4']] | ['Kate', 'Diana'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] |
| 3 | 18 | [['d2', 'd3'], ['d3', 'd4']] | ['Ian'] | [['d2', 'd3'], ['d3', 'd4']] |
+----+-------+------------------------------+-------------------+--------------------------------------------+
I have this dataframe:
import pandas as pd
data = {'small group': [['a1', 'a2'], ['a2', 'a3'],['a3','a4'], ['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4'], ['c1', 'c2'], ['c2', 'c3'], ['f1', 'f2']],
'all_groups': [[['a1', 'a2'], ['a2', 'a3'],['a3','a4']], [['a1', 'a2'], ['a2', 'a3'],['a3','a4']],
[['a1', 'a2'], ['a2', 'a3'],['a3','a4']],
[['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']], [['d1', 'd2'], ['d2', 'd3'],
['d3', 'd4']], [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']], [['c1', 'c2'], ['c2', 'c3']],
[['c1', 'c2'], ['c2', 'c3']], [['f1', 'f2']]],
'name':[['Alina','Kate'],['Alina','Kate','Diana'],['Kate','Diana'],['Mike','Bob'],['Ian','Lili'],['George','Cloud','Ian','Petro'],['Jone','Petro','Lili'],['Marinet','Yu','Chloe','Ian'],['Nate','Rose']],
'day': [28, 28,28, 18, 18, 18, 20, 20, 3],
}
df = pd.DataFrame(data)
Output:
+----+---------------+--------------------------------------------+-------------------------------------+-------+
| | small group | all_groups | name | day |
|----+---------------+--------------------------------------------+-------------------------------------+-------|
| 0 | ['a1', 'a2'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] | ['Alina', 'Kate'] | 28 |
| 1 | ['a2', 'a3'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] | ['Alina', 'Kate', 'Diana'] | 28 |
| 2 | ['a3', 'a4'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] | ['Kate', 'Diana'] | 28 |
| 3 | ['d1', 'd2'] | [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']] | ['Mike', 'Bob'] | 18 |
| 4 | ['d2', 'd3'] | [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']] | ['Ian', 'Lili'] | 18 |
| 5 | ['d3', 'd4'] | [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']] | ['George', 'Cloud', 'Ian', 'Petro'] | 18 |
| 6 | ['c1', 'c2'] | [['c1', 'c2'], ['c2', 'c3']] | ['Jone', 'Petro', 'Lili'] | 20 |
| 7 | ['c2', 'c3'] | [['c1', 'c2'], ['c2', 'c3']] | ['Marinet', 'Yu', 'Chloe', 'Ian'] | 20 |
| 8 | ['f1', 'f2'] | [['f1', 'f2']] | ['Nate', 'Rose'] | 3 |
+----+---------------+--------------------------------------------+-------------------------------------+-------+
I need to intersect a column with names in pairs when I groupby.I need to intersect all possible combinations, not just in order. I understand how to do this for all lists, but I don’t understand how to output in pairs.I would like to use a function called intersect and aggregate. But maybe there is another way
df_new = (df.groupby(['day','all_groups'])
.agg({'name': intersect})).reset_index()
+----+-------+------------------------------+-------------------+--------------------------------------------+
| | day | pair | name_intersect | all_groups |
|----+-------+------------------------------+-------------------+--------------------------------------------|
| 0 | 28 | [['a1', 'a2'], ['a2', 'a3']] | ['Alina', 'Kate'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] |
| 1 | 28 | [['a1', 'a2'], ['a3', 'a4']] | ['Kate'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] |
| 2 | 28 | [['a2', 'a3'], ['a3', 'a4']] | ['Kate', 'Diana'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] |
| 3 | 18 | [['d1', 'd2'], ['d2', 'd3']] | [] | [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']] |
| 4 | 18 | [['d1', 'd2'], ['d3', 'd4']] | [] | [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']] |
| 5 | 18 | [['d2', 'd3'], ['d3', 'd4']] | ['Ian'] | [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']] |
| 6 | 20 | [['c1', 'c2'], ['c2', 'c3']] | [] | [['c1', 'c2'], ['c2', 'c3']] |
| 7 | 3 | ['f1', 'f2'] | [] | [['f1', 'f2']] |
+----+-------+------------------------------+-------------------+--------------------------------------------+
Then I'd like to remove empty rows in the name_intersect
column. And consider such cases as pair of ['d1', 'd2']
in the list [['d1', 'd2'], ['d2', 'd3'], ['d3', 'd4']]
, because with this pair there are no intersections with others(I want to remove this pair from this list). So, I'd like to get an output like this:
+----+-------+------------------------------+-------------------+--------------------------------------------+
| | day | pair | name_intersect | all_groups |
|----+-------+------------------------------+-------------------+--------------------------------------------|
| 0 | 28 | [['a1', 'a2'], ['a2', 'a3']] | ['Alina', 'Kate'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] |
| 1 | 28 | [['a1', 'a2'], ['a3', 'a4']] | ['Kate'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] |
| 2 | 28 | [['a2', 'a3'], ['a3', 'a4']] | ['Kate', 'Diana'] | [['a1', 'a2'], ['a2', 'a3'], ['a3', 'a4']] |
| 3 | 18 | [['d2', 'd3'], ['d3', 'd4']] | ['Ian'] | [['d2', 'd3'], ['d3', 'd4']] |
+----+-------+------------------------------+-------------------+--------------------------------------------+
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果要将
name_intersect
用-1移动列,则现在,删除包含空列表的行
> name_intersect
并获取df_final
整个代码:
If you want to shift the
name_intersect
column by -1 thenNow, delete rows containing empty list in
name_intersect
and getdf_final
Entire code: