如何通过 MediaWiki API 从 Wikipedia 文章中获取信息框？

发布于 2024-12-08 00:00:02 字数 200 浏览 4 评论 0原文

维基百科文章可能有信息框模板。通过以下调用，我可以获得包含信息框的文章的第一部分。

http://en.wikipedia.org/w/api.php?action=parse&pageid=568801&section=0&prop=wikitext

我想要一个仅返回信息框数据的查询。这可能吗？

原文

Wikipedia articles may have Infobox templates. By the following call I can get the first section of an article which includes an Infobox.

http://en.wikipedia.org/w/api.php?action=parse&pageid=568801§ion=0&prop=wikitext

I want a query which will return only Infobox data. Is this possible?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

调妓 2024-12-15 00:00:02

您可以通过对 Wikipedia API 的 URL 调用来完成此操作，如下所示：

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xmlfm&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0

将 titles= 部分替换为您的页面标题，并将 format=xmlfm 替换为 format=json 如果您想要 JSON 格式的文章。

You can do it with a URL call to the Wikipedia API like this:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xmlfm&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0

Replace the titles= section with your page title, and format=xmlfm to format=json if you want the article in JSON format.

回复收藏 0 原文

长不大的小祸害 2024-12-15 00:00:02

与其自己解析信息框，这相当复杂，不如看看DBPedia，它提取了维基百科信息框作为数据库对象。

回复收藏 0 原文

北渚 2024-12-15 00:00:02

构建于 garry 的回答，您可以通过 rvparse 参数让 Wikipedia 将信息框解析为 HTML，如下所示：

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0&rvparse

请注意，两者都不是方法将仅返回信息框。但是，您可以从 HTML 内容中提取（通过，例如 Beautiful Soup）table 与类 infobox。

在 Python 中，您可以执行以下操作

resp = requests.get(url).json()
page_one = next(iter(resp['query']['pages'].values()))
revisions = page_one.get('revisions', [])
html = next(iter(revisions[0].values()))
# Now parse the HTML

Building on garry's answer, you can have Wikipedia parse the info box into HTML for you via the rvparse parameter like so:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0&rvparse

Note that neither method will return just the info box. But from the HTML content, you can extract (via, e.g., Beautiful Soup) the table with class infobox.

In Python, you do something like the following

resp = requests.get(url).json()
page_one = next(iter(resp['query']['pages'].values()))
revisions = page_one.get('revisions', [])
html = next(iter(revisions[0].values()))
# Now parse the HTML

回复收藏 0 原文