如何从PANDAS DataFrame创建具有正确顺序的字符串数据类型的枢轴表

发布于 2025-02-12 05:03:04 字数 1640 浏览 1 评论 0原文

我有一个dataframe，如下所示，

这是表格格式中相同的数据，您可以复制/粘贴，

SourceName     SourceType   Edge       TargetName      TargetType
cardiac myosin     DISEASE  induce     myocarditis     DISEASE
cardiac myosin     DISEASE  induce     heart disease   DISEASE
nitric             CHEMICAL inhibit    chrysin         CHEMICAL
peptide magainin   CHEMICAL exhibited  tumor           DISEASE

以下是词典格式中相同的数据，您可以复制/粘贴，

{'id': [1, 2, 3, 4],
 'SourceName': ['cardiac myosin',
  'cardiac myosin',
  'nitric',
  'peptide magainin'],
 'SourceType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'CHEMICAL'],
 'Edge': ['induce', 'induce', 'inhibit', 'exhibited'],
 'TargetName': ['myocarditis',
  'heart disease',
  'chrysin',
  'tumor'],
 'TargetType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'DISEASE']}

我尝试使用在代码下方，但是某些源代码的类型错误，例如“肽杂志”应该是一种化学物质，但疾病不正确。

df1 = df.groupby(["id","SourceType","TargetType"])['SourceName', 'Edge', 'TargetName'].aggregate(lambda x: x).unstack().reset_index()
df1.columns=df1.columns.tolist()

不正确的样本输出，有人可以帮助我，谢谢。

预期输出：

原文

I have a dataframe which looks like below,

Here is the same data in table format which you can copy/paste,

SourceName     SourceType   Edge       TargetName      TargetType
cardiac myosin     DISEASE  induce     myocarditis     DISEASE
cardiac myosin     DISEASE  induce     heart disease   DISEASE
nitric             CHEMICAL inhibit    chrysin         CHEMICAL
peptide magainin   CHEMICAL exhibited  tumor           DISEASE

Here is the same data in dictionary format which you can copy/paste,

{'id': [1, 2, 3, 4],
 'SourceName': ['cardiac myosin',
  'cardiac myosin',
  'nitric',
  'peptide magainin'],
 'SourceType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'CHEMICAL'],
 'Edge': ['induce', 'induce', 'inhibit', 'exhibited'],
 'TargetName': ['myocarditis',
  'heart disease',
  'chrysin',
  'tumor'],
 'TargetType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'DISEASE']}

I tried using below code, but some of the SourceName was having wrong type, eg 'peptide magainin' should be a CHEMICAL, but it comes under DISEASE which is incorrect.

df1 = df.groupby(["id","SourceType","TargetType"])['SourceName', 'Edge', 'TargetName'].aggregate(lambda x: x).unstack().reset_index()
df1.columns=df1.columns.tolist()

Sample output which is incorrect, can someone help me with this, thanks.

Expected output:

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笑看君怀她人 2025-02-19 05:03:04

我不完全了解您尝试与新结构实现的目标，但是可以通过“ SourceType”和“ TargetType”进行一次分组，然后合并产生的DataFrames：

source_df = pd.DataFrame()
target_df = pd.DataFrame()

for s, sub_df in df.groupby('SourceType'):
    source_sub_df = sub_df[['id', 'SourceName']]
    source_sub_df.columns = ['id', f'SourceType_{s}']
    source_df = pd.concat([source_df, source_sub_df])

for t, sub_df in df.groupby('TargetType'):
    target_sub_df = sub_df[['id', 'Edge', 'TargetName']]
    target_sub_df.columns = ['id', 'Edge', f'TargetType_{t}']
    target_df = pd.concat([target_df, target_sub_df])

df_out = source_df.merge(target_df, on='id').sort_values('id').reset_index(drop=True)

print(df_out)

输出：输出：

   id SourceType_CHEMICAL SourceType_DISEASE       Edge TargetType_CHEMICAL TargetType_DISEASE
0   1                 NaN     cardiac myosin     induce                 NaN        myocarditis
1   2                 NaN     cardiac myosin     induce                 NaN      heart disease
2   3              nitric                NaN    inhibit             chrysin                NaN
3   4    peptide magainin                NaN  exhibited                 NaN              tumor

I don't understand exactly what you try to achieve with the new structure, but it can be done by grouping once by "SourceType" and once by "TargetType", then merging the resulting dataframes:

source_df = pd.DataFrame()
target_df = pd.DataFrame()

for s, sub_df in df.groupby('SourceType'):
    source_sub_df = sub_df[['id', 'SourceName']]
    source_sub_df.columns = ['id', f'SourceType_{s}']
    source_df = pd.concat([source_df, source_sub_df])

for t, sub_df in df.groupby('TargetType'):
    target_sub_df = sub_df[['id', 'Edge', 'TargetName']]
    target_sub_df.columns = ['id', 'Edge', f'TargetType_{t}']
    target_df = pd.concat([target_df, target_sub_df])

df_out = source_df.merge(target_df, on='id').sort_values('id').reset_index(drop=True)

print(df_out)

Output:

   id SourceType_CHEMICAL SourceType_DISEASE       Edge TargetType_CHEMICAL TargetType_DISEASE
0   1                 NaN     cardiac myosin     induce                 NaN        myocarditis
1   2                 NaN     cardiac myosin     induce                 NaN      heart disease
2   3              nitric                NaN    inhibit             chrysin                NaN
3   4    peptide magainin                NaN  exhibited                 NaN              tumor

回复收藏 0 原文

~没有更多了~