python,如何用列表填充空数据框
我正在尝试编写一个代码以保存在矩阵中的某些列表之间的共同元素。 示例:
具有所有列表的数据框架:
i的ID | ID |
---|---|
G1 | P1,P2,P3,P4 |
G2 | P3,P5 |
G3 | P1,P3,P3,P5 |
G4 | P6 |
元素开始,我从具有G1,G2,G3,G3,G4作为列的空矩阵开始和行名称以及填充了NAN的单元格,我要获得的结果是:
X | G1 | G2 | G3 | G4 |
---|---|---|---|---|
G4 G1 | P1,P2,P3,P3,P4 | P3 | P1 | 无 |
G2 | P3 P3 | P3 P3 P3 | P3,P5 P3,P5 | NONE |
G3 | P1,P5,P5 | p3,p5 | ,p3,p5 | none |
g4 | 无 | 无 | p6 | p1 |
这是我的代码:
import sys
import pandas as pd
def intersection(lst1, lst2):
return [value for value in lst1 if value in lst2]
data = pd.read_csv(sys.argv[1], sep="\t")
p_mat = pd.read_csv(sys.argv[2], sep="\t", index_col=0)
c_mat = pd.read_csv(sys.argv[3], sep="\t", index_col=0)
#I need this since the elements of the second column once imported are seen as a single string instead of being lists
for i in range(0,len(data)):
data['MP term list'][i] = data['MP term list'][i].split(",")
for i in p_mat:
for j in p_mat.columns:
r = intersection(data[data['MGI id'] == i]['MP term list'].values.tolist()[0],data[data['MGI id'] == j]['MP term list'].values.tolist()[0])
if len(r)!=0:
p_mat.at[i,j] = r
else:
p_mat.at[i, j] = None
del(r)
现在我只能正确填充第一个单元获取此错误:
valueerror:使用估计的
设置时必须具有相等的LEN键和值,
该如何修复?
I'm trying to write a code to save in a matrix the common elements between some lists.
Example:
Data frame with all the lists:
ID | elements of the ID |
---|---|
G1 | P1,P2,P3,P4 |
G2 | P3,P5 |
G3 | P1,P3,P5 |
G4 | P6 |
I start with an empty matrix having G1,G2,G3,G4 as columns and rows names and the cells filled with nan, the result I would like to obtain is the following:
X | G1 | G2 | G3 | G4 |
---|---|---|---|---|
G1 | P1,P2,P3,P4 | P3 | P1 | None |
G2 | P3 | P3,P5 | P3,P5 | None |
G3 | P1,P5 | P3,P5 | P1,P3,P5 | None |
G4 | None | None | None | P6 |
This is my code:
import sys
import pandas as pd
def intersection(lst1, lst2):
return [value for value in lst1 if value in lst2]
data = pd.read_csv(sys.argv[1], sep="\t")
p_mat = pd.read_csv(sys.argv[2], sep="\t", index_col=0)
c_mat = pd.read_csv(sys.argv[3], sep="\t", index_col=0)
#I need this since the elements of the second column once imported are seen as a single string instead of being lists
for i in range(0,len(data)):
data['MP term list'][i] = data['MP term list'][i].split(",")
for i in p_mat:
for j in p_mat.columns:
r = intersection(data[data['MGI id'] == i]['MP term list'].values.tolist()[0],data[data['MGI id'] == j]['MP term list'].values.tolist()[0])
if len(r)!=0:
p_mat.at[i,j] = r
else:
p_mat.at[i, j] = None
del(r)
For now I'm able to fill only the first cell correctly, then at the first non-empty result that I try to store in a cell I get this error:
ValueError: Must have equal len keys and value when setting with an iterable
How can I fix it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试使用交叉
MERGE
,SET交叉点
和pivot
:输入DF:
Try with a cross
merge
, setintersection
andpivot
:Input df:
输出 :
Output :