我想确定给定的Wikipedia页面是否使用MediaWiki API属于某个Wikipedia门户。到目前为止,我一直在尝试 page properties> page properties 我似乎无法找到一种方法来得出给定页面所属的门户网站。
例如,在Wikipedia页面上的蛋糕在页面的底部,我可以按在蛋糕的部分上show ,并显示了一些指向不同蛋糕页面的链接。在那里,我还可以看到所有这些都属于食品门户。我希望使用MediaWiki API从给定页面提取该信息。
I wish to determine whether a given Wikipedia page belongs to a certain Wikipedia Portal using the MediaWiki API. So far, I have been experimenting with the page properties of the API but I cannot seem to find a way to derive what Portal a given page belongs to.
As an example, on the Wikipedia page for Cake in the very bottom of the page, I can press Show on the section Cakes, and a bunch of links to different cake pages show up. There I can also see that all of these belong to the Food portal. It is that information that I would wish to extract from a given page using the MediaWiki API.
data:image/s3,"s3://crabby-images/a29e0/a29e04aac6bea49cb262f047dc43acd6b0cbda46" alt="enter image description here"
发布评论
评论(2)
据我所知,维基百科实际上没有对“门户财产”的正式定义。与是MediaWiki软件一部分的类别相反,门户网站是Wikipedia的自定义页面,旨在使探索主题更容易。
不过,您可以使用启发式方法,而不是正式的定义,并根据其中一个链接到另一个,确定页面和某些门户网站之间的连接。这两者都有API端点:(
注意:
100
是'portal` namespace的id ),哪个门户网站页面从页面“ cake”或“ pizza”
format=json&Amp; prop = links& “ pizza”
https://en.wikipedia.org/w/api.php?action = query& format = json& prop=linkshere&pprop=linkshere&titles=cake%7cpizza&
(尽管您可以看到,许多无关的门户网站链接到“蛋糕”,而没有链接到“披萨”)
两个方向的合并查询
?
As far as I know, there is actually no formal definition of "belongings to a portal" in Wikipedia. Opposed to categories which are part of the MediaWiki software, portals are custom pages for Wikipedia that are aimed to make it easier to explore a topic.
Instead of a formal definition though, you can use an heuristic and determine the connection between the page and some portal based on one of them linking to the other. There are API endpoints for both:
(Note:
100
is the id of the 'Portal` namespace)Which portal pages are linked from the page "Cake" or "Pizza"
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links&titles=Cake%7CPizza&plnamespace=100
Which portal pages link to the page "Cake" or "Pizza"
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=linkshere&titles=Cake%7CPizza&lhnamespace=100
(though as you can see, many unrelated portals link to "Cake" and none link to "Pizza")
A combined query for both directions
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links%7Clinkshere&titles=Cake%7CPizza&plnamespace=100&lhnamespace=100
因此,其他一些调查我找到了答案:
我最终使用修订版 API。这使我能够提供一系列我想研究的页面标题,并以JSON格式返回给我的每个页面的HTML。然后,我可以只搜索包含
Portal
的行,并找出该页面所属的门户(如果有)。如果有人处于类似情况,这是对API的示例查询:
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles =面包| bubble_tea | pizza&格式= json& redirects& rvprop = content& rvslots = main
So trough some more investigation i found the answer:
I ended up using the Revisions property in the API. This allows me to to give a series of page titles that I want to investigate, and have the HTML of each page returned to me in json format. Then I can just search for lines containing
Portal
and figure out what portal (if any) the page belongs to.If anyone are in a similar situation, here is an example query to the API:
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Bread|Bubble_tea|Pizza&format=json&redirects&rvprop=content&rvslots=main