谁能分享一个使用Mathematica和Google学术提取学术研究信息的简单例子
如何使用 Mathematica 和 Google Scholar 查找某人在 2011 年发表的论文数量?
How can I use Mathematica and Google scholar to find the number of papers a person published in 2011?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Google Scholar 不太适合这个目标,因为它没有正式的 API AFAIK。它也不提供结构化(例如 XML)格式的结果。因此,我们必须求助于一种快速(而且非常非常脆弱!)的文本模式匹配技巧,例如:
如果您不喜欢字符串结果,则添加
ToExpression
。如果您想限制出版年份,可以将&as_ylo=2011&as_yhi=2011&
添加到搜索字符串并更改开始和结束年份适当地。
请注意,具有流行名字的作者会产生大量虚假点击,因为无法唯一地识别单个作者。此外,Scholar 还返回各种热门内容,包括引文、书籍、重印等。所以,实际上,这对于计数来说并不是很有用。
一些解释:
Scholar 将作者和合著者的姓名首字母和姓名拆分为多个与 + 组合的
author:
字段。代码的StringDrop[StringJoin @@ ("author:" <> # <> "+" & /@ StringSplit[author]), -1]
部分负责处理那。StringDrop
删除最后一个+
。Stringcases
部分包含一个大文本模式,它基本上搜索 Scholar 放置在每个结果页面顶部的文本,并包含点击数。然后该号码被隔离并返回。Google Scholar is not very suited for this goal as it doesn't have a formal API AFAIK. It also doesn't provide results in a structured (e.g. XML) format. So, we have to resort to a quick (and very, very fragile!) text pattern matching hack like:
Add
ToExpression
if you don't like the string result. If you want to restrict the publication years you can add&as_ylo=2011&as_yhi=2011&
to the search string and change the start and end yearsappropriately.
Please note that authors with popular names will generate lots of spurious hits as there is no way to uniquely identify a single author. Additionally, Scholar returns a diversity of hits, including citations, books, reprints and more. So, really, this ain't very useful for counting.
A bit of explanation:
Scholar splits the initials and names of authors and co-authors over several
author:
fields combined with a +. TheStringDrop[StringJoin @@ ("author:" <> # <> "+" & /@ StringSplit[author]), -1]
part of the code takes care of that. TheStringDrop
removes the last+
.The
Stringcases
part contains a large text pattern which basically searches for the text that Scholar places at the top of each results page and which contains the number of hits. This number is then isolated and returned.