如何从维基百科获取Infobox数据?

发布于 2024-09-11 00:23:38 字数 54 浏览 5 评论 0原文

如果我有某个页面的 URL,我如何使用 MediaWiki Web 服务获取右侧的信息框信息?

If I have the url to a page, how would I obtain the Infobox information on the right using MediaWiki webservices?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

明月夜 2024-09-18 00:23:38

通过此 Python 库使用 Mediawiki API: https://github.com/siznax/wptools

用法:

import wptools
so = wptools.page('Stack Overflow').get_parse()
infobox = so.data['infobox']
print(infobox)

输出:

{'alexa': '{{Increase}} 34 ( {{as of|2019|12|15|lc|=|y}} )',
 'author': '[[Jeff Atwood]] and [[Joel Spolsky]]',
 'caption': 'Screenshot of Stack Overflow in February 2017',
 'commercial': 'Yes',
 'content_license': '[[Creative Commons license|CC-BY-SA]] 4.0',
 'current_status': 'Online',
 'language': 'English, Spanish, Russian, Portuguese, and Japanese',
 'launch_date': '{{start date and age|2008|9|15}}',
 'logo': 'Stack Overflow logo.svg',
 'name': 'Stack Overflow',
 'owner': '[[Stack Exchange]], Inc.',
 'programming_language': '[[C Sharp (programming language)|C#]]',
 'registration': 'Optional',
 'screenshot': 'File:Stack Overflow homepage, Feb 2017.png',
 'type': '[[Knowledge market]]',
 'url': '{{URL|https://stackoverflow.com}}'}

Use the Mediawiki API through this Python library: https://github.com/siznax/wptools

Usage:

import wptools
so = wptools.page('Stack Overflow').get_parse()
infobox = so.data['infobox']
print(infobox)

Output:

{'alexa': '{{Increase}} 34 ( {{as of|2019|12|15|lc|=|y}} )',
 'author': '[[Jeff Atwood]] and [[Joel Spolsky]]',
 'caption': 'Screenshot of Stack Overflow in February 2017',
 'commercial': 'Yes',
 'content_license': '[[Creative Commons license|CC-BY-SA]] 4.0',
 'current_status': 'Online',
 'language': 'English, Spanish, Russian, Portuguese, and Japanese',
 'launch_date': '{{start date and age|2008|9|15}}',
 'logo': 'Stack Overflow logo.svg',
 'name': 'Stack Overflow',
 'owner': '[[Stack Exchange]], Inc.',
 'programming_language': '[[C Sharp (programming language)|C#]]',
 'registration': 'Optional',
 'screenshot': 'File:Stack Overflow homepage, Feb 2017.png',
 'type': '[[Knowledge market]]',
 'url': '{{URL|https://stackoverflow.com}}'}
淡忘如思 2024-09-18 00:23:38

如果您只想解析信息框或者想要获取一些摘要数据,请查看 DBPedia 项目:http://dbpedia。 org

DBPedia 项目扫描 WP 中的信息框,从 Wikipedia 创建 RDF 数据库: https: //github.com/dbpedia/extraction-framework/

If you just want to parse the infobox or you want to get some digested data, a look at the DBPedia project: http://dbpedia.org

The DBPedia project scans the infoboxes in WP to create a RDF database from Wikipedia: https://github.com/dbpedia/extraction-framework/

纵山崖 2024-09-18 00:23:38

没有简单的方法可以做到这一点。您可以尝试使用 action=raw 获取页面内容,即 http://en.wikipedia.org/w/index.php?action=raw&title=Douglas_Jardine
然后通过搜索 {{Infobox 找到信息框的开头。然后通过查找匹配的 }} 来找到结尾,同时考虑到信息框本身也可以包含 {{-}}{{{-}}} 对。

There is no trivial way to do that. You can try fetching the page content using action=raw, i.e. http://en.wikipedia.org/w/index.php?action=raw&title=Douglas_Jardine
Then find the start of the infobox by searching for {{Infobox. Then find the end by finding the matching }}, taking into account that the infobox itself can also contain {{-}} and {{{-}}} pairs.

相对绾红妆 2024-09-18 00:23:38

每个维基百科页面都与一个维基数据项相关联,所有这些项目都包含来自维基百科页面的信息框模板的大部分参数。因此,您只需从 Wikidata 访问与您的维基百科页面关联的数据API

如何从 维基数据项

https://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&props=claims&titles=Donald Trump

响应将包括:出生日期和地点、图像、宗教、母亲、父亲、孩子、身高、签名、官方网站等...有关唐纳德·特朗普的所有主要信息都包含在维基百科信息框中...

Each Wikipedia page is associated with a Wikidata item, and all these items include the most parameters from the Wikipedia page's Infobox templates. So you need only to access the data associated with your Wikipedia page from Wikidata API.

An example of how to get the data for Wikipedia Donald Trump page from Wikidata item:

https://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&props=claims&titles=Donald Trump

The response will include: date and place of birth, image, religion, mother, father, children, height, signature, official website, etc..., all main info about Donald Trump included in the Wikipedia Infobox...

自在安然 2024-09-18 00:23:38

Tomxu - 您所说的是一个模板 - 这是一个简单的页面,您可以将其包含在另一个页面上。对于信息框,您需要首先查看 Template:Infobox。这为您提供了详细的说明。

您还可以按编辑(或查看代码)并将内容复制到您自己的 wiki。请记住,模板往往处于层次结构中,因此您可能需要复制 Infobox 使用的其他模板(如果您想使用它们)。每个模板都可以用 {{}} 来标识,例如信息框模板将如下所示:{{Infobox}}。

我提到了一个层次结构:您实际上会找到多个模板,它们都使用模板:信息框。要找到它们,只需在维基百科的搜索字段中输入:Template:Infobox,然后您就会找到多个示例,例如模板:Infobox writer

更新: 如果您指的是导航框,则 查看此信息

Tomxu - what you're talking about is a template - which is simple a page you can include on another page. For the infobox you need to start by looking at Template:Infobox. This gives you detailed instructions.

You can also press edit (or view code) and copy the contents to your own wiki. Bear in mind that templates tend to be in a hierarchy so you might need to copy other templates that Infobox uses (if you want to use them). Each template can be identified with {{}} so e.g. the Infobox template will look like this: {{Infobox}}.

I mentioned a hierarchy: you'll actually find multiple templates that all use Template: Infobox. To find them, just type this into Wikipedia's search field: Template:Infobox and then you'll find multiple examples, e.g. Template:Infobox writer

Update: if you mean Navboxes, then see this information.

羅雙樹 2024-09-18 00:23:38

在我们的项目中,我们使用查询从维基词典中获取数据,如下所示:

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fen.wiktionary.org%2Fwiki%2Flife%22%20and%20xpath%3D'%2F%2Fdiv%5B%40id%3D%22bodyContent%22%5D'&format=xml&diagnostics=false&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=recwiki

我对它没有全面的了解,但它确实有效。输出结果可以使用 jquery 或其他东西进行过滤。

In our project we use queries for fetching data from wiktionary like this:

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fen.wiktionary.org%2Fwiki%2Flife%22%20and%20xpath%3D'%2F%2Fdiv%5B%40id%3D%22bodyContent%22%5D'&format=xml&diagnostics=false&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=recwiki

I have no comprehensive understanding of it, but it works. Output result can de filtered using jquery or something else.

小…楫夜泊 2024-09-18 00:23:38

使用编辑模式怎么样?您可以从正确的 TextArea 开始(大部分时间包含 id="wpTextBox1")并解析该 TextArea 的内容...
我用来查找的 URL 是(注:section=0):

https://de.wikipedia.org/w/index.php?title=Pelephone&action=edit§ion=0

Greetings

What about using the Edit Mode? You could just start at the correct TextArea (most of the Time contains id="wpTextBox1") and parse the content of that TextArea ...
The URL I used to find that out was (Note: section=0):

https://de.wikipedia.org/w/index.php?title=Pelephone&action=edit§ion=0

Greetings

拒绝两难 2024-09-18 00:23:38

也可以使用 pandas:

import pandas as pd
page = 'https://pt.wikipedia.org/wiki/Python'
infoboxes = pd.read_html(page, index_col=0, attrs={"class":"infobox"})
print(infoboxes)

It is possible using pandas too:

import pandas as pd
page = 'https://pt.wikipedia.org/wiki/Python'
infoboxes = pd.read_html(page, index_col=0, attrs={"class":"infobox"})
print(infoboxes)
夜未央樱花落 2024-09-18 00:23:38

使用 MediaWiki,您可以使用下面的链接查看 Wikipedia 页面右侧的信息框。如您所见,格式为 JSON(可以更改),通过将“氢”一词更改为您想要的特定标题,您将获得一个带有信息框的页面。

https://en。 wikipedia.org/w/api.php?action=parse&page=Template:Infobox%20Hydrogen&format=json

Using MediaWiki, you can view the infobox on the right of a Wikipedia page by using this link below. As you see, the format is in JSON (can be changed) and by changing the "hydrogen" word to the specific title you want you will get an page with an infobox.

https://en.wikipedia.org/w/api.php?action=parse&page=Template:Infobox%20hydrogen&format=json

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文