如何从PANDAS DataFrame创建具有正确顺序的字符串数据类型的枢轴表

发布于 2025-02-12 05:03:04 字数 1640 浏览 1 评论 0原文

我有一个dataframe,如下所示,

”在此处输入映像说明“

这是表格格式中相同的数据,您可以复制/粘贴,

SourceName     SourceType   Edge       TargetName      TargetType
cardiac myosin     DISEASE  induce     myocarditis     DISEASE
cardiac myosin     DISEASE  induce     heart disease   DISEASE
nitric             CHEMICAL inhibit    chrysin         CHEMICAL
peptide magainin   CHEMICAL exhibited  tumor           DISEASE

以下是词典格式中相同的数据,您可以复制/粘贴,

{'id': [1, 2, 3, 4],
 'SourceName': ['cardiac myosin',
  'cardiac myosin',
  'nitric',
  'peptide magainin'],
 'SourceType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'CHEMICAL'],
 'Edge': ['induce', 'induce', 'inhibit', 'exhibited'],
 'TargetName': ['myocarditis',
  'heart disease',
  'chrysin',
  'tumor'],
 'TargetType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'DISEASE']}

我尝试使用在代码下方,但是某些源代码的类型错误,例如“肽杂志”应该是一种化学物质,但疾病不正确。

df1 = df.groupby(["id","SourceType","TargetType"])['SourceName', 'Edge', 'TargetName'].aggregate(lambda x: x).unstack().reset_index()
df1.columns=df1.columns.tolist()

不正确的样本输出,有人可以帮助我,谢谢。

预期输出:

”在此处输入图像说明”

I have a dataframe which looks like below,

enter image description here

Here is the same data in table format which you can copy/paste,

SourceName     SourceType   Edge       TargetName      TargetType
cardiac myosin     DISEASE  induce     myocarditis     DISEASE
cardiac myosin     DISEASE  induce     heart disease   DISEASE
nitric             CHEMICAL inhibit    chrysin         CHEMICAL
peptide magainin   CHEMICAL exhibited  tumor           DISEASE

Here is the same data in dictionary format which you can copy/paste,

{'id': [1, 2, 3, 4],
 'SourceName': ['cardiac myosin',
  'cardiac myosin',
  'nitric',
  'peptide magainin'],
 'SourceType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'CHEMICAL'],
 'Edge': ['induce', 'induce', 'inhibit', 'exhibited'],
 'TargetName': ['myocarditis',
  'heart disease',
  'chrysin',
  'tumor'],
 'TargetType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'DISEASE']}

I tried using below code, but some of the SourceName was having wrong type, eg 'peptide magainin' should be a CHEMICAL, but it comes under DISEASE which is incorrect.

df1 = df.groupby(["id","SourceType","TargetType"])['SourceName', 'Edge', 'TargetName'].aggregate(lambda x: x).unstack().reset_index()
df1.columns=df1.columns.tolist()

Sample output which is incorrect, can someone help me with this, thanks.

enter image description here

Expected output:

enter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

笑看君怀她人 2025-02-19 05:03:04

我不完全了解您尝试与新结构实现的目标,但是可以通过“ SourceType”和“ TargetType”进行一次分组,然后合并产生的DataFrames:

source_df = pd.DataFrame()
target_df = pd.DataFrame()

for s, sub_df in df.groupby('SourceType'):
    source_sub_df = sub_df[['id', 'SourceName']]
    source_sub_df.columns = ['id', f'SourceType_{s}']
    source_df = pd.concat([source_df, source_sub_df])

for t, sub_df in df.groupby('TargetType'):
    target_sub_df = sub_df[['id', 'Edge', 'TargetName']]
    target_sub_df.columns = ['id', 'Edge', f'TargetType_{t}']
    target_df = pd.concat([target_df, target_sub_df])

df_out = source_df.merge(target_df, on='id').sort_values('id').reset_index(drop=True)

print(df_out)

输出:输出:

   id SourceType_CHEMICAL SourceType_DISEASE       Edge TargetType_CHEMICAL TargetType_DISEASE
0   1                 NaN     cardiac myosin     induce                 NaN        myocarditis
1   2                 NaN     cardiac myosin     induce                 NaN      heart disease
2   3              nitric                NaN    inhibit             chrysin                NaN
3   4    peptide magainin                NaN  exhibited                 NaN              tumor

I don't understand exactly what you try to achieve with the new structure, but it can be done by grouping once by "SourceType" and once by "TargetType", then merging the resulting dataframes:

source_df = pd.DataFrame()
target_df = pd.DataFrame()

for s, sub_df in df.groupby('SourceType'):
    source_sub_df = sub_df[['id', 'SourceName']]
    source_sub_df.columns = ['id', f'SourceType_{s}']
    source_df = pd.concat([source_df, source_sub_df])

for t, sub_df in df.groupby('TargetType'):
    target_sub_df = sub_df[['id', 'Edge', 'TargetName']]
    target_sub_df.columns = ['id', 'Edge', f'TargetType_{t}']
    target_df = pd.concat([target_df, target_sub_df])

df_out = source_df.merge(target_df, on='id').sort_values('id').reset_index(drop=True)

print(df_out)

Output:

   id SourceType_CHEMICAL SourceType_DISEASE       Edge TargetType_CHEMICAL TargetType_DISEASE
0   1                 NaN     cardiac myosin     induce                 NaN        myocarditis
1   2                 NaN     cardiac myosin     induce                 NaN      heart disease
2   3              nitric                NaN    inhibit             chrysin                NaN
3   4    peptide magainin                NaN  exhibited                 NaN              tumor
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文