获取所有维基百科信息框模板和使用它们的所有页面
给定像 Wikipedia: Stack Overflow 这样的 Wikipedia 页面,通常会有信息框(大部分位于右侧)页面顶部)。屏幕截图示例:
DBPedia 将所有这些属性列为 RDF 三元组。您可以在 DBPedia:Stack Overflow 中查看该示例。在那里您可以看到属性
dbpprop:wikiPageUsesTemplate
及其值dbpedia:Template:Infobox_website
,这很有趣。我想知道哪些维基百科页面使用此模板。我怎样才能做到这一点并列出使用 Infobox_website 模板的所有页面?最好使用 SPARQL 查询,但我愿意接受其他简单的解决方案。接下来是所有信息框模板的列表。 维基百科:类别信息框模板 显示所需维基百科类别的层次结构 - 看起来像我的内容我正在寻找。但我希望所有这些都以机器可读的格式在一页上。也许 DBPedia 在这里也是正确的?在 DBPedia:类别 Infox 模板 和 DBPedia: INFOBOX 我发现信息很少。但这些看起来非常有前途。我如何使用 SPARQL 查找所有 Infobox 类型,以便我可以为每个类型重复执行步骤 1?
您可以使用它来测试 SPARQL 查询: http://dbpedia.org/snorql/
更新 1
我似乎已解决问题 1:SPARQL:列出包含 Infobox_website 的所有页面
更新 2
另外,这似乎是问题 2 的查询: SPARQL:列出所有信息框
Given a Wikipedia page like Wikipedia: Stack Overflow there are often Infoboxes (mostly on the right hand at the top of the page). Example screenshot:
DBPedia lists all these attributes as RDF triples. You can see the example at DBPedia: Stack Overflow. There you see the property
dbpprop:wikiPageUsesTemplate
with the valuedbpedia:Template:Infobox_website
which is interesting. I want to know which Wikipedia pages use this template. How can i do that and list all pages which use the Infobox_website template? Preferably with a SPARQL query but i am open to other easy solutions.Next thing is a list of all Infobox Templates. Wikipedia: Category Infobox Templates shows the hierarchy of the desired Wikipedia categories - that looks like what i am seeking. But i want all of these in a machine readable format, on one page. Maybe DBPedia is the right thing here too? At DBPedia: Category Infox Templates and DBPedia: INFOBOX i find very few information. But these are looking very promising. How can i use SPARQL to find all Infobox Types so that i can do step 1 repeatedly for each of them?
You can use this for testing the SPARQL queries: http://dbpedia.org/snorql/
Update 1
I seem to have solved problem number 1: SPARQL: list all pages with Infobox_website
Update 2
Also, this seems to be the query for problem number 2: SPARQL: list all Infoboxes
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
好吧,因为我似乎找到了一个解决方案(很可能不是最好的),我想分享它们。
1) 此 SPARQL 查询可用于查找包含特定 Infobox 类型的所有页面:
<一href="http://dbpedia.org/snorql/?query=SELECT%20%2a%20WHERE%20%7B%20%20?page%20dbpedia2%3awikiPageUsesTemplate%20 %3Chttp://dbpedia.org/resource/Template%3aInfobox_website%3E%20.%20%20?page%20dbpedia2%3aname%20?name%20.%7D">链接at SNORQL
2) 此 SPARQL 查询可用于查找所有 Infobox 类型:
链接在 SNORQL
Ok, since i seem to have found a solution (most probably not the best) i want to share them.
1) This SPARQL query can be used to find all pages that include a specific Infobox type:
Link at SNORQL
2) This SPARQL query can be used to find all Infobox types:
Link at SNORQL
之前的答案似乎已经失效了。只需要进行一些小的更改即可让它们在新的 dbpedia 查询端点上工作,网址为 http://live.dbpedia.org/ sparql 不过。
要获取所有页面及其使用的模板的列表,可以使用此查询:
查看结果(仅限100)
如果您正在寻找特定模板:
查看结果
对于我的用例,我对 Wikipedia URL 而不是 DBPedia 页面感兴趣,因此我使用以下查询:
查看结果
我还使用
curl
将结果提取到脚本中:我不确定这是否给出了完整的结果集,因为它返回1698 个结果,而 wmflabs.org 似乎建议应该有 4439。
对于问题的第二部分,只需对之前的查询进行一点小小的更改即可获取所有模板的列表:
查看结果
The previous answers seem to have stopped working. Only a small change is required to get them working at the new dbpedia query endpoint at http://live.dbpedia.org/sparql though.
To get a list of all of the pages and the templates that they use this query works:
See results (limited to 100)
If you're looking for a specific template:
See results
And for my use case I'm interested in the Wikipedia URL rather than the DBPedia page, so I'm using this query:
See results
I'm also using
curl
to pull the results into a script:I'm not sure if this gives the full result set though, because it returns 1698 results whereas wmflabs.org seems to suggest there should be 4439.
For the second part of your question, only a small change is needed from the previous query to get a list of all templates:
See results
您还可以使用 MediaWiki API 的 embeddedin 查询返回包含以下内容的所有页面的列表:给定的模板。不过,您会想要使用库来访问 API,您更喜欢哪种语言?对于 Ruby,我建议使用 MediaWiki::Gateway。
You can also use the MediaWiki API's embeddedin query to return a list of all pages that include a given template. You'll want to use a library for accessing the API though, which language would you prefer? For Ruby, I'd suggest MediaWiki::Gateway.