PubMed提取文章详细信息到Daframe
这是代码。
import pandas as pd
from pymed import PubMed
import numpy as np
pubmed = PubMed(tool="PubMedSearcher", email="[email protected]")
## PUT YOUR SEARCH TERM HERE ##
search_term = 'Charlie Brown'
results = pubmed.query(search_term, max_results=100000)
articleList = []
articleInfo = []
for article in results:
# Print the type of object we've found (can be either PubMedBookArticle or PubMedArticle).
# We need to convert it to dictionary with available function
articleDict = article.toDict()
articleList.append(articleDict)
# Generate list of dict records which will hold all article details that could be fetch from PUBMED API
for article in articleList:
#Sometimes article['pubmed_id'] contains list separated with comma - take first pubmedId in that list - thats article pubmedId
pubmedId = article['pubmed_id'].partition('\n')[0]
# Append article info to dictionary
articleInfo.append({u'pubmed_id':pubmedId,
u'publication_date':article['publication_date'],
u'authors':article['authors']})
df=pd.json_normalize(articleInfo)
运行此代码将获取三列,PubMed_ID,publication_date和作者 。
有没有办法可以不介意作者列并保留其他两个列?非常感谢。
Here is the code.
import pandas as pd
from pymed import PubMed
import numpy as np
pubmed = PubMed(tool="PubMedSearcher", email="[email protected]")
## PUT YOUR SEARCH TERM HERE ##
search_term = 'Charlie Brown'
results = pubmed.query(search_term, max_results=100000)
articleList = []
articleInfo = []
for article in results:
# Print the type of object we've found (can be either PubMedBookArticle or PubMedArticle).
# We need to convert it to dictionary with available function
articleDict = article.toDict()
articleList.append(articleDict)
# Generate list of dict records which will hold all article details that could be fetch from PUBMED API
for article in articleList:
#Sometimes article['pubmed_id'] contains list separated with comma - take first pubmedId in that list - thats article pubmedId
pubmedId = article['pubmed_id'].partition('\n')[0]
# Append article info to dictionary
articleInfo.append({u'pubmed_id':pubmedId,
u'publication_date':article['publication_date'],
u'authors':article['authors']})
df=pd.json_normalize(articleInfo)
Running this code would fetch three columns, pubmed_id, publication_date and authors.
Is there a way to unnest the authors column and keep the other two columns? Thanks so much in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您想不愿意,那么您必须定义一些策略。例如,您可以使用
LastName,FirstName
将每个作者用;
:输出:输出:
If you want to unnest then, you have to define some strategy. For example, you can join the authors with
lastname, firstname
splitting each author with;
:Output: