谁能分享一个使用Mathematica和Google学术提取学术研究信息的简单例子

发布于 2024-11-09 13:09:28 字数 61 浏览 0 评论 0原文

如何使用 Mathematica 和 Google Scholar 查找某人在 2011 年发表的论文数量?

How can I use Mathematica and Google scholar to find the number of papers a person published in 2011?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

爱,才寂寞 2024-11-16 13:09:28

Google Scholar 不太适合这个目标,因为它没有正式的 API AFAIK。它也不提供结构化(例如 XML)格式的结果。因此,我们必须求助于一种快速(而且非常非常脆弱!)的文本模式匹配技巧,例如:

 searchGoogleScholarAuthor[author_String] := 
 First[StringCases[
   Import["http://scholar.google.com/scholar?start=0&num=1&q=" <> 
     StringDrop[
      StringJoin @@ ("author:" <> # <> "+" & /@ 
         StringSplit[author]), -1] <> "&hl=en&as_sdt=1,5"], ___ ~~ 
     "Results" ~~ ___ ~~ "of about" ~~ Shortest[___] ~~ 
     p : Longest[(DigitCharacter | ",") ..] ~~ ___ ~~ "." ~~ ___ ~~ 
     "(" ~~ ___ :> p]]

In[191]:= searchGoogleScholarAuthor["A Einstein"]

Out[191]= "6,400"

In[190]:= searchGoogleScholarAuthor["Einstein"]

Out[190]= "9,400"

In[192]:= searchGoogleScholarAuthor["Wizard"]

Out[192]= "197"

In[193]:= searchGoogleScholarAuthor["Vries"]

Out[193]= "70,700"

如果您不喜欢字符串结果,则添加 ToExpression 。如果您想限制出版年份,可以将 &as_ylo=2011&as_yhi=2011& 添加到搜索字符串并更改开始和结束年份
适当地。

请注意,具有流行名字的作者会产生大量虚假点击,因为无法唯一地识别单个作者。此外,Scholar 还返回各种热门内容,包括引文、书籍、重印等。所以,实际上,这对于计数来说并不是很有用。

一些解释:

Scholar 将作者和合著者的姓名首字母和姓名拆分为多个与 + 组合的 author: 字段。代码的 StringDrop[StringJoin @@ ("author:" <> # <> "+" & /@ StringSplit[author]), -1] 部分负责处理那。 StringDrop 删除最后一个+

Stringcases 部分包含一个大文本模式,它基本上搜索 Scholar 放置在每个结果页面顶部的文本,并包含点击数。然后该号码被隔离并返回。

Google Scholar is not very suited for this goal as it doesn't have a formal API AFAIK. It also doesn't provide results in a structured (e.g. XML) format. So, we have to resort to a quick (and very, very fragile!) text pattern matching hack like:

 searchGoogleScholarAuthor[author_String] := 
 First[StringCases[
   Import["http://scholar.google.com/scholar?start=0&num=1&q=" <> 
     StringDrop[
      StringJoin @@ ("author:" <> # <> "+" & /@ 
         StringSplit[author]), -1] <> "&hl=en&as_sdt=1,5"], ___ ~~ 
     "Results" ~~ ___ ~~ "of about" ~~ Shortest[___] ~~ 
     p : Longest[(DigitCharacter | ",") ..] ~~ ___ ~~ "." ~~ ___ ~~ 
     "(" ~~ ___ :> p]]

In[191]:= searchGoogleScholarAuthor["A Einstein"]

Out[191]= "6,400"

In[190]:= searchGoogleScholarAuthor["Einstein"]

Out[190]= "9,400"

In[192]:= searchGoogleScholarAuthor["Wizard"]

Out[192]= "197"

In[193]:= searchGoogleScholarAuthor["Vries"]

Out[193]= "70,700"

Add ToExpression if you don't like the string result. If you want to restrict the publication years you can add &as_ylo=2011&as_yhi=2011& to the search string and change the start and end years
appropriately.

Please note that authors with popular names will generate lots of spurious hits as there is no way to uniquely identify a single author. Additionally, Scholar returns a diversity of hits, including citations, books, reprints and more. So, really, this ain't very useful for counting.

A bit of explanation:

Scholar splits the initials and names of authors and co-authors over several author: fields combined with a +. The StringDrop[StringJoin @@ ("author:" <> # <> "+" & /@ StringSplit[author]), -1] part of the code takes care of that. The StringDrop removes the last +.

The Stringcases part contains a large text pattern which basically searches for the text that Scholar places at the top of each results page and which contains the number of hits. This number is then isolated and returned.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文