如何从PANDAS DataFrame创建具有正确顺序的字符串数据类型的枢轴表
我有一个dataframe,如下所示,
这是表格格式中相同的数据,您可以复制/粘贴,
SourceName SourceType Edge TargetName TargetType
cardiac myosin DISEASE induce myocarditis DISEASE
cardiac myosin DISEASE induce heart disease DISEASE
nitric CHEMICAL inhibit chrysin CHEMICAL
peptide magainin CHEMICAL exhibited tumor DISEASE
以下是词典格式中相同的数据,您可以复制/粘贴,
{'id': [1, 2, 3, 4],
'SourceName': ['cardiac myosin',
'cardiac myosin',
'nitric',
'peptide magainin'],
'SourceType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'CHEMICAL'],
'Edge': ['induce', 'induce', 'inhibit', 'exhibited'],
'TargetName': ['myocarditis',
'heart disease',
'chrysin',
'tumor'],
'TargetType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'DISEASE']}
我尝试使用在代码下方,但是某些源代码的类型错误,例如“肽杂志”应该是一种化学物质,但疾病不正确。
df1 = df.groupby(["id","SourceType","TargetType"])['SourceName', 'Edge', 'TargetName'].aggregate(lambda x: x).unstack().reset_index()
df1.columns=df1.columns.tolist()
不正确的样本输出,有人可以帮助我,谢谢。
预期输出:
I have a dataframe which looks like below,
Here is the same data in table format which you can copy/paste,
SourceName SourceType Edge TargetName TargetType
cardiac myosin DISEASE induce myocarditis DISEASE
cardiac myosin DISEASE induce heart disease DISEASE
nitric CHEMICAL inhibit chrysin CHEMICAL
peptide magainin CHEMICAL exhibited tumor DISEASE
Here is the same data in dictionary format which you can copy/paste,
{'id': [1, 2, 3, 4],
'SourceName': ['cardiac myosin',
'cardiac myosin',
'nitric',
'peptide magainin'],
'SourceType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'CHEMICAL'],
'Edge': ['induce', 'induce', 'inhibit', 'exhibited'],
'TargetName': ['myocarditis',
'heart disease',
'chrysin',
'tumor'],
'TargetType': ['DISEASE', 'DISEASE', 'CHEMICAL', 'DISEASE']}
I tried using below code, but some of the SourceName was having wrong type, eg 'peptide magainin' should be a CHEMICAL, but it comes under DISEASE which is incorrect.
df1 = df.groupby(["id","SourceType","TargetType"])['SourceName', 'Edge', 'TargetName'].aggregate(lambda x: x).unstack().reset_index()
df1.columns=df1.columns.tolist()
Sample output which is incorrect, can someone help me with this, thanks.
Expected output:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不完全了解您尝试与新结构实现的目标,但是可以通过“ SourceType”和“ TargetType”进行一次分组,然后合并产生的DataFrames:
输出:输出:
I don't understand exactly what you try to achieve with the new structure, but it can be done by grouping once by "SourceType" and once by "TargetType", then merging the resulting dataframes:
Output: