python,如何用列表填充空数据框

发布于 2025-01-24 07:52:09 字数 1827 浏览 1 评论 0原文

我正在尝试编写一个代码以保存在矩阵中的某些列表之间的共同元素。 示例:

具有所有列表的数据框架:

i的IDID
G1P1,P2,P3,P4
G2P3,P5
G3P1,P3,P3,P5
G4P6

元素开始,我从具有G1,G2,G3,G3,G4作为列的空矩阵开始和行名称以及填充了NAN的单元格,我要获得的结果是:

XG1G2G3G4
G4 G1P1,P2,P3,P3,P4P3P1
G2P3 P3P3 P3 P3P3,P5 P3,P5NONE
G3P1,P5,P5p3,p5,p3,p5none
g4p6p1

这是我的代码:

import sys
import pandas as pd

def intersection(lst1, lst2):
    return [value for value in lst1 if value in lst2]

data = pd.read_csv(sys.argv[1], sep="\t")
p_mat = pd.read_csv(sys.argv[2], sep="\t", index_col=0)
c_mat = pd.read_csv(sys.argv[3], sep="\t", index_col=0)

#I need this since the elements of the second column once imported are seen as a single string instead of being lists
for i in range(0,len(data)):
    data['MP term list'][i] = data['MP term list'][i].split(",")


for i in p_mat:
    for j in p_mat.columns:
        r = intersection(data[data['MGI id'] == i]['MP term list'].values.tolist()[0],data[data['MGI id'] == j]['MP term list'].values.tolist()[0])
        if len(r)!=0:
            p_mat.at[i,j] = r
        else:
            p_mat.at[i, j] = None
        del(r) 

现在我只能正确填充第一个单元获取此错误:

valueerror:使用估计的设置时必须具有相等的LEN键和值,

该如何修复?

I'm trying to write a code to save in a matrix the common elements between some lists.
Example:

Data frame with all the lists:

IDelements of the ID
G1P1,P2,P3,P4
G2P3,P5
G3P1,P3,P5
G4P6

I start with an empty matrix having G1,G2,G3,G4 as columns and rows names and the cells filled with nan, the result I would like to obtain is the following:

XG1G2G3G4
G1P1,P2,P3,P4P3P1None
G2P3P3,P5P3,P5None
G3P1,P5P3,P5P1,P3,P5None
G4NoneNoneNoneP6

This is my code:

import sys
import pandas as pd

def intersection(lst1, lst2):
    return [value for value in lst1 if value in lst2]

data = pd.read_csv(sys.argv[1], sep="\t")
p_mat = pd.read_csv(sys.argv[2], sep="\t", index_col=0)
c_mat = pd.read_csv(sys.argv[3], sep="\t", index_col=0)

#I need this since the elements of the second column once imported are seen as a single string instead of being lists
for i in range(0,len(data)):
    data['MP term list'][i] = data['MP term list'][i].split(",")


for i in p_mat:
    for j in p_mat.columns:
        r = intersection(data[data['MGI id'] == i]['MP term list'].values.tolist()[0],data[data['MGI id'] == j]['MP term list'].values.tolist()[0])
        if len(r)!=0:
            p_mat.at[i,j] = r
        else:
            p_mat.at[i, j] = None
        del(r) 

For now I'm able to fill only the first cell correctly, then at the first non-empty result that I try to store in a cell I get this error:

ValueError: Must have equal len keys and value when setting with an iterable

How can I fix it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

青柠芒果 2025-01-31 07:52:09

尝试使用交叉MERGE,SET 交叉点pivot

df["elements"] = df["elements of the ID"].str.split(",").map(set)

cross = df[["ID", "elements"]].merge(df[["ID", "elements"]], how="cross")
cross["intersection"] = (cross.apply(lambda row: row["elements_x"].intersection(row["elements_y"]), axis=1)
                              .map(",".join)
                              .replace("",None)
                        )

output = cross.pivot("ID_x", "ID_y", "intersection").rename_axis(None, axis=1).rename_axis(None)

>>> output
             G1     G2        G3    G4
G1  P2,P1,P3,P4     P3     P1,P3  None
G2           P3  P3,P5     P3,P5  None
G3        P1,P3  P3,P5  P1,P3,P5  None
G4         None   None      None    P6
输入DF:
df = pd.DataFrame({"ID": [f"G{i+1}" for i in range(4)],
                   "elements of the ID": ["P1,P2,P3,P4", "P3,P5", "P1,P3,P5", "P6"]})

Try with a cross merge, set intersection and pivot:

df["elements"] = df["elements of the ID"].str.split(",").map(set)

cross = df[["ID", "elements"]].merge(df[["ID", "elements"]], how="cross")
cross["intersection"] = (cross.apply(lambda row: row["elements_x"].intersection(row["elements_y"]), axis=1)
                              .map(",".join)
                              .replace("",None)
                        )

output = cross.pivot("ID_x", "ID_y", "intersection").rename_axis(None, axis=1).rename_axis(None)

>>> output
             G1     G2        G3    G4
G1  P2,P1,P3,P4     P3     P1,P3  None
G2           P3  P3,P5     P3,P5  None
G3        P1,P3  P3,P5  P1,P3,P5  None
G4         None   None      None    P6
Input df:
df = pd.DataFrame({"ID": [f"G{i+1}" for i in range(4)],
                   "elements of the ID": ["P1,P2,P3,P4", "P3,P5", "P1,P3,P5", "P6"]})
夜夜流光相皎洁 2025-01-31 07:52:09
import pandas as pd
ID = ["G1","G2","G3","G4"]
Elements = [["P1","P2","P3","P4"],
            ["P3","P5"],
            ["P1","P3","P5"],
            ["P6"]]

df = pd.DataFrame(zip(ID,Elements),columns = ["ID","Elements"])
df1 = pd.DataFrame(columns = ID)
df1["ID"] = ID
for i in ID:
    for j in ID:
        if i == j:
            df1.loc[df1.ID == i,j] = df.loc[df.ID == i,"Elements"]
        else:
            df1 = df1.astype("object")
            df1.loc[df1.ID == i,j] = df1.loc[df1.ID == i,j].apply(
                lambda x : list(set(list(df.loc[df.ID == i,"Elements"])[0]) & set(list(df.loc[df.ID == j,"Elements"])[0])))

输出 :

df1
Out[38]: 
                 G1        G2            G3    G4  ID
0  [P1, P2, P3, P4]      [P3]      [P1, P3]    []  G1
1              [P3]  [P3, P5]      [P5, P3]    []  G2
2          [P1, P3]  [P5, P3]  [P1, P3, P5]    []  G3
3                []        []            []  [P6]  G4
import pandas as pd
ID = ["G1","G2","G3","G4"]
Elements = [["P1","P2","P3","P4"],
            ["P3","P5"],
            ["P1","P3","P5"],
            ["P6"]]

df = pd.DataFrame(zip(ID,Elements),columns = ["ID","Elements"])
df1 = pd.DataFrame(columns = ID)
df1["ID"] = ID
for i in ID:
    for j in ID:
        if i == j:
            df1.loc[df1.ID == i,j] = df.loc[df.ID == i,"Elements"]
        else:
            df1 = df1.astype("object")
            df1.loc[df1.ID == i,j] = df1.loc[df1.ID == i,j].apply(
                lambda x : list(set(list(df.loc[df.ID == i,"Elements"])[0]) & set(list(df.loc[df.ID == j,"Elements"])[0])))

Output :

df1
Out[38]: 
                 G1        G2            G3    G4  ID
0  [P1, P2, P3, P4]      [P3]      [P1, P3]    []  G1
1              [P3]  [P3, P5]      [P5, P3]    []  G2
2          [P1, P3]  [P5, P3]  [P1, P3, P5]    []  G3
3                []        []            []  [P6]  G4
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文