通过API访问维基百科页面的主图
有什么方法可以使用 API 访问任何维基百科页面的缩略图吗?我的意思是盒子右上角的图像。有相关的 API 吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
有什么方法可以使用 API 访问任何维基百科页面的缩略图吗?我的意思是盒子右上角的图像。有相关的 API 吗?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(14)
您可以使用
prop=pageimages
获取任何维基百科页面的缩略图。例如:您将获得缩略图的完整 URL。
You can get the thumbnail of any wikipedia page using
prop=pageimages
. For example:And you will get the thumbnail full URL.
http://en.wikipedia.org/w/api.php
看看
prop=图像
。它返回解析页面中使用的图像文件名数组。然后,您可以选择进行另一个 API 调用来查找完整的图像 URL,例如:
action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url
或 通过文件名计算 URL哈希。
不幸的是,虽然
prop=images
返回的图像数组按照它们在页面上找到的顺序排列,但不能保证第一个图像是信息框中的图像,因为有时页面会包含信息框之前的图像(大多数情况下是有关页面的元数据的图标:例如“本文已锁定”)。在图像数组中搜索包含页面标题的第一张图像可能是对信息框图像的最佳猜测。
http://en.wikipedia.org/w/api.php
Look at
prop=images
.It returns an array of image filenames that are used in the parsed page. You then have the option of making another API call to find out the full image URL, e.g.:
action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url
or to calculate the URL via the filename's hash.
Unfortunately, while the array of images returned by
prop=images
is in the order they are found on the page, the first can not be guaranteed to be the image in the info box because sometimes a page will include an image before the infobox (most of the time icons for metadata about the page: e.g. "this article is locked").Searching the array of images for the first image that includes the page title is probably the best guess for the infobox image.
这是获取维基百科页面主图像的好方法
http://en.wikipedia.org/w/api.php?action=query&prop=pageimages&format=json&piprop=original&titles=印度
This is good way to get the Main Image of a page in wikipedia
http://en.wikipedia.org/w/api.php?action=query&prop=pageimages&format=json&piprop=original&titles=India
查看 MediaWiki API 示例,获取维基百科页面的主图片: https://www.mediawiki .org/wiki/API:Page_info_in_search_results。
正如其他人提到的,您可以在 API 查询中使用
prop=pageimages
。如果您还需要图像描述,则可以在 API 查询中使用
prop=pageimages|pageterms
。您可以使用 piprop=original 获取原始图像。或者您可以获得具有指定宽度/高度的缩略图。对于 width/height=600 的缩略图,
piprop=thumbnail&pithumbsize=600
。如果省略其中任何一个,API 回调中返回的图像将默认为宽度/高度为 50px 的缩略图。如果您请求 JSON 格式的结果,则应始终在 API 查询中使用
formatversion=2
(即format=json&formatversion=2
),因为这样可以检索从图像中查询更容易。原始尺寸图像:
缩略图尺寸(600 像素宽/高)图像:
Check out the MediaWiki API example for getting the main picture of a wikipedia page: https://www.mediawiki.org/wiki/API:Page_info_in_search_results.
As other's have mentioned, you would use
prop=pageimages
in your API query.If you also want the image description, you would use
prop=pageimages|pageterms
instead in your API query.You can get the original image using
piprop=original
. Or you can get a thumbnail image with a specified width/height. For a thumbnail with width/height=600,piprop=thumbnail&pithumbsize=600
. If you omit either, the image returned in the API callback will default to a thumbnail with width/height of 50px.If you are requesting results in JSON format, you should always use
formatversion=2
in your API query (i.e.,format=json&formatversion=2
) because it makes retrieving the image from the query easier.Original Size Image:
Thumbnail Size (600px width/height) Image:
方法一:你可以尝试这样的查询:
在响应中,您可以看到
图像
标签。方式 2:使用查询 http://en.wikipedia.org/ w/index.php?action=render&title=italy
然后你可以得到一个原始的 html 代码,你可以使用类似
PHP Simple HTML DOM Parser
的东西来获取图像http://simplehtmldom.sourceforge.net
我没有时间写给你。只是给你一些建议,谢谢。
Way 1: You can try some query like this:
in the response, you can see the
Image
tag.Way 2: use query http://en.wikipedia.org/w/index.php?action=render&title=italy
then you can get a raw html code, you can get the image use something like
PHP Simple HTML DOM Parser
http://simplehtmldom.sourceforge.net
I have no time write it to you. just give you some advice, thanks.
很抱歉没有具体回答您关于主图像的问题。但这里有一些代码来获取所有图像的列表:
我得到了这个 http://en .wikipedia.org/wiki/Saturn_%28mythology%29:
对于第二个网址(http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel):
请注意,第二个数组的第 6 个元素上的 URL 发生了一些变化。这就是 @JosephJaber 在上面的评论中警告的内容。
希望这对某人有帮助。
I'm sorry for not answering specifically your question about the main image. But here's some code to get a list of all images:
I got this for http://en.wikipedia.org/wiki/Saturn_%28mythology%29:
And for the second URL (http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel):
Note that the URL changed a bit on the 6th element of the second array. It's what @JosephJaber was warning about in his comment above.
Hope this helps someone.
我编写了一些代码,通过维基百科文章标题获取主图像(完整 URL)。它并不完美,但总的来说我对结果非常满意。
挑战在于,当查询特定标题时,维基百科会返回多个图像文件名(不带路径)。此外,二次搜索(我使用了本线程中发布的代码 varatis - 谢谢!)会返回根据搜索到的图像文件名找到的所有图像的 URL,无论原始文章标题如何。毕竟,我们最终可能会得到与搜索无关的通用图像,因此我们将其过滤掉。该代码迭代文件名和 URL,直到找到(希望是最好的)匹配...有点复杂,但它有效:)
关于通用过滤器的注意事项:我一直在为 isGeneric() 编译通用图像字符串列表功能,但列表不断增长。我正在考虑将其保留为公开列表 - 如果有任何兴趣请告诉我。
上一篇:
主函数 - 从标题获取图像 URL:
== 上面的主函数调用了以下函数 ==
按标题获取 JSON 对象(文件名):
按文件名获取 JSON 对象(URL):
过滤掉通用图像:
欢迎评论。
I have written some code that gets main image (full URL) by Wikipedia article title. It's not perfect, but overall I'm very pleased with the results.
The challenge was that when queried for a specific title, Wikipedia returns multiple image filenames (without path). Furthermore, the secondary search (I used the code varatis posted in this thread - thanks!) returns URLs of all images found based on the image filename that was searched, regardless of the original article title. After all this, we may end up with a generic image irrelevant to the search, so we filter those out. The code iterates over filenames and URLs until it finds (hopefully the best) match... a bit complicated, but it works :)
Note on the generic filter: I've been compiling a list of generic image strings for the isGeneric() function, but the list just keeps growing. I am considering maintaining it as a public list - if there is any interest let me know.
Pre:
Main function - get image URL from title:
== The following functions are called by the main function above ==
Get JSON object (filenames) by title:
Get JSON object (URLs) by filename:
Filter out generic images:
Comments welcome.
让我们以页面 http://en.wikipedia.org/wiki/index 为例。 html?curid=57570
获取主图
查看
结果页面数据例如。
我们得到主图片文件名这个结果为
现在,由于我们有了图像文件名,因此我们必须进行另一个 Api 调用以从文件名获取完整图像路径,如下所示
例如。
返回图像数据数组,其中 url 为
http://upload.wikimedia.org/wikipedia/commons/3/35/Sachin_at_Castrol_Golden_Spanner_Awards_%28crop%29.jpg
Lets take Example of Page http://en.wikipedia.org/wiki/index.html?curid=57570
to get Main Pic
Check out
Results Page Data Eg.
We get main Pic file name this result as
Now as we have Image file name we will have to make another Api Call to get full image path from file name as follows
Eg.
Returns Array of Image Data having url in it as
http://upload.wikimedia.org/wikipedia/commons/3/35/Sachin_at_Castrol_Golden_Spanner_Awards_%28crop%29.jpg
我有一种方法可以可靠地获取维基百科页面的主图像 - 名为 PageImages 的扩展
https://www.mediawiki.org/wiki/Extension:PageImages
只需添加 prop pageimages 到您的 API 查询:
这可以可靠地过滤掉烦人的默认图像,并防止您必须自己过滤它们!该扩展已安装在所有主要维基百科页面上...
I there is a way to reliably get a main image for a wikipedia page - the Extension called PageImages
https://www.mediawiki.org/wiki/Extension:PageImages
Just add the prop pageimages to your API Query:
This reliably filters out annoying default images and prevents you from having to filter them yourself! The extension is installed on all the main wikipedia pages...
就像 Anuraj 提到的,pageimages 参数就是它。看看下面的网址,它会带来一些漂亮的东西:
她有一些有趣的参数:
您可以使用的描述。 (exsentences 是您要在摘录中包含的句子数)
Like Anuraj mentioned, the pageimages parameter is it. Look at the following url that'll bring about some nifty stuff:
Her are some interesting parameters:
description you can use. (exsentences is the number of sentences you want to include in the excerpt)
请参阅有关 Wikipedia API 的此相关问题。但是,我不知道是否可以通过 API 检索缩略图。
您还可以考虑仅解析网页来查找图像 URL,并以这种方式检索图像。
See this related question on an API for Wikipedia. However, I would not know if it is possible to retrieve the thumbnail picture through an API.
You can also consider just parsing the web page to find the image URL, and retrieve the image that way.
这是我发现 95% 的文章都适用的 XPath 列表。主要是 1、2、3 和 4。很多文章的格式不正确,这些都是边缘情况:
您可以使用 DOM 解析库来使用 XPath 获取图像。
我在 libxml2.2 周围使用了一个名为 Hpple 的 ObjC 包装器来提取图像 url。希望这有帮助
Here is my list of XPaths I have found work for 95 percent of articles. the main ones are 1, 2 3 and 4. A lot of articles are not formatted correctly and these would be edge cases:
You can use a DOM parsing lib to fetch image using the XPath.
I used a ObjC wrapper called Hpple around libxml2.2 to pull out the image url. Hope this helps
您还可以使用名为 SDWebImage 的 cocoa Pod
代码示例(记住还要添加
import SDWebImage
):You can also use cocoa Pod called SDWebImage
Code sample (remember to also add
import SDWebImage
):我认为不是,但您可以使用链接解析器 HTML 文档捕获图像
I think not, but you can capture the image using a link parser HTML documents