通过API访问维基百科页面的主图

发布于 2024-12-19 07:34:13 字数 62 浏览 3 评论 0 原文

有什么方法可以使用 API 访问任何维基百科页面的缩略图吗?我的意思是盒子右上角的图像。有相关的 API 吗?

Is there any way I can access the thumbnail picture of any wikipedia page by using an API? I mean the image on the top right side in box. Is there any APIs for that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

允世 2024-12-26 07:34:13

您可以使用 prop=pageimages 获取任何维基百科页面的缩略图。例如:

http://en.wikipedia.org/w/api.php?action=query&titles=Al-Farabi&prop=pageimages&format=json&pithumbsize=100

您将获得缩略图的完整 URL。

You can get the thumbnail of any wikipedia page using prop=pageimages. For example:

http://en.wikipedia.org/w/api.php?action=query&titles=Al-Farabi&prop=pageimages&format=json&pithumbsize=100

And you will get the thumbnail full URL.

海之角 2024-12-26 07:34:13

http://en.wikipedia.org/w/api.php

看看prop=图像

它返回解析页面中使用的图像文件名数组。然后,您可以选择进行另一个 API 调用来查找完整的图像 URL,例如:
action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url

通过文件名计算 URL哈希

不幸的是,虽然 prop=images 返回的图像数组按照它们在页面上找到的顺序排列,但不能保证第一个图像是信息框中的图像,因为有时页面会包含信息框之前的图像(大多数情况下是有关页面的元数据的图标:例如“本文已锁定”)。

在图像数组中搜索包含页面标题的第一张图像可能是对信息框图像的最佳猜测。

http://en.wikipedia.org/w/api.php

Look at prop=images.

It returns an array of image filenames that are used in the parsed page. You then have the option of making another API call to find out the full image URL, e.g.:
action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url

or to calculate the URL via the filename's hash.

Unfortunately, while the array of images returned by prop=images is in the order they are found on the page, the first can not be guaranteed to be the image in the info box because sometimes a page will include an image before the infobox (most of the time icons for metadata about the page: e.g. "this article is locked").

Searching the array of images for the first image that includes the page title is probably the best guess for the infobox image.

听,心雨的声音 2024-12-26 07:34:13

查看 MediaWiki API 示例,获取维基百科页面的主图片: https://www.mediawiki .org/wiki/API:Page_info_in_search_results

正如其他人提到的,您可以在 API 查询中使用 prop=pageimages

如果您还需要图像描述,则可以在 API 查询中使用 prop=pageimages|pageterms

您可以使用 piprop=original 获取原始图像。或者您可以获得具有指定宽度/高度的缩略图。对于 width/height=600 的缩略图,piprop=thumbnail&pithumbsize=600。如果省略其中任何一个,API 回调中返回的图像将默认为宽度/高度为 50px 的缩略图。

如果您请求 JSON 格式的结果,则应始终在 API 查询中使用 formatversion=2(即 format=json&formatversion=2),因为这样可以检索从图像中查询更容易。

原始尺寸图像:

https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=original&titles=Albert Einstein

缩略图尺寸(600 像素宽/高)图像:

https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=thumbnail&pithumbsize=600&titles=Albert Einstein

Check out the MediaWiki API example for getting the main picture of a wikipedia page: https://www.mediawiki.org/wiki/API:Page_info_in_search_results.

As other's have mentioned, you would use prop=pageimages in your API query.

If you also want the image description, you would use prop=pageimages|pageterms instead in your API query.

You can get the original image using piprop=original. Or you can get a thumbnail image with a specified width/height. For a thumbnail with width/height=600, piprop=thumbnail&pithumbsize=600. If you omit either, the image returned in the API callback will default to a thumbnail with width/height of 50px.

If you are requesting results in JSON format, you should always use formatversion=2 in your API query (i.e., format=json&formatversion=2) because it makes retrieving the image from the query easier.

Original Size Image:

https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=original&titles=Albert Einstein

Thumbnail Size (600px width/height) Image:

https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=thumbnail&pithumbsize=600&titles=Albert Einstein
只等公子 2024-12-26 07:34:13

方法一:你可以尝试这样的查询:

http://en.wikipedia.org/w/api.php?action=opensearch&limit=5&format=xml&search=italy&namespace=0

在响应中,您可以看到图像标签。

<Item>
<Text xml:space="preserve">Italy national rugby union team</Text>
<Description xml:space="preserve">
The Italy national rugby union team represent the nation of Italy in the sport of rugby union.
</Description>
<Url xml:space="preserve">
http://en.wikipedia.org/wiki/Italy_national_rugby_union_team
</Url>
<Image source="http://upload.wikimedia.org/wikipedia/en/thumb/4/46/Italy_rugby.png/43px-Italy_rugby.png" width="43" height="50"/>
</Item>

方式 2:使用查询 http://en.wikipedia.org/ w/index.php?action=render&title=italy

然后你可以得到一个原始的 html 代码,你可以使用类似 PHP Simple HTML DOM Parser 的东西来获取图像
http://simplehtmldom.sourceforge.net

我没有时间写给你。只是给你一些建议,谢谢。

Way 1: You can try some query like this:

http://en.wikipedia.org/w/api.php?action=opensearch&limit=5&format=xml&search=italy&namespace=0

in the response, you can see the Image tag.

<Item>
<Text xml:space="preserve">Italy national rugby union team</Text>
<Description xml:space="preserve">
The Italy national rugby union team represent the nation of Italy in the sport of rugby union.
</Description>
<Url xml:space="preserve">
http://en.wikipedia.org/wiki/Italy_national_rugby_union_team
</Url>
<Image source="http://upload.wikimedia.org/wikipedia/en/thumb/4/46/Italy_rugby.png/43px-Italy_rugby.png" width="43" height="50"/>
</Item>

Way 2: use query http://en.wikipedia.org/w/index.php?action=render&title=italy

then you can get a raw html code, you can get the image use something like PHP Simple HTML DOM Parser
http://simplehtmldom.sourceforge.net

I have no time write it to you. just give you some advice, thanks.

哎呦我呸! 2024-12-26 07:34:13

很抱歉没有具体回答您关于图像的问题。但这里有一些代码来获取所有图像的列表:

function makeCall($url) {
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    return curl_exec($curl);
}

function wikipediaImageUrls($url) {
    $imageUrls = array();
    $pathComponents = explode('/', parse_url($url, PHP_URL_PATH));
    $pageTitle = array_pop($pathComponents);
    $imagesQuery = "http://en.wikipedia.org/w/api.php?action=query&titles={$pageTitle}&prop=images&format=json";
    $jsonResponse = makeCall($imagesQuery);
    $response = json_decode($jsonResponse, true);
    $imagesKey = key($response['query']['pages']);
    foreach($response['query']['pages'][$imagesKey]['images'] as $imageArray) {
        if($imageArray['title'] != 'File:Commons-logo.svg' && $imageArray['title'] != 'File:P vip.svg') {
            $title = str_replace('File:', '', $imageArray['title']);
            $title = str_replace(' ', '_', $title);
            $imageUrlQuery = "http://en.wikipedia.org/w/api.php?action=query&titles=Image:{$title}&prop=imageinfo&iiprop=url&format=json";
            $jsonUrlQuery = makeCall($imageUrlQuery);
            $urlResponse = json_decode($jsonUrlQuery, true);
            $imageKey = key($urlResponse['query']['pages']);
            $imageUrls[] = $urlResponse['query']['pages'][$imageKey]['imageinfo'][0]['url'];
        }
    }
    return $imageUrls;
}
print_r(wikipediaImageUrls('http://en.wikipedia.org/wiki/Saturn_%28mythology%29'));
print_r(wikipediaImageUrls('http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel'));

我得到了这个 http://en .wikipedia.org/wiki/Saturn_%28mythology%29

Array
(
    [0] => http://upload.wikimedia.org/wikipedia/commons/1/10/Arch_of_SeptimiusSeverus.jpg
    [1] => http://upload.wikimedia.org/wikipedia/commons/8/81/Ivan_Akimov_Saturn_.jpg
    [2] => http://upload.wikimedia.org/wikipedia/commons/d/d7/Lucius_Appuleius_Saturninus.jpg
    [3] => http://upload.wikimedia.org/wikipedia/commons/2/2c/Polidoro_da_Caravaggio_-_Saturnus-thumb.jpg
    [4] => http://upload.wikimedia.org/wikipedia/commons/b/bd/Porta_Maggiore_Alatri.jpg
    [5] => http://upload.wikimedia.org/wikipedia/commons/6/6a/She-wolf_suckles_Romulus_and_Remus.jpg
    [6] => http://upload.wikimedia.org/wikipedia/commons/4/45/Throne_of_Saturn_Louvre_Ma1662.jpg
)

对于第二个网址(http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel):

Array
(
    [0] => http://upload.wikimedia.org/wikipedia/commons/e/e9/BmRKEL.jpg
    [1] => http://upload.wikimedia.org/wikipedia/commons/3/3f/BmRKELS.jpg
    [2] => http://upload.wikimedia.org/wikipedia/commons/2/2c/Bundesarchiv_Bild_101I-655-5976-04%2C_Russland%2C_Sturzkampfbomber_Junkers_Ju_87_G.jpg
    [3] => http://upload.wikimedia.org/wikipedia/commons/6/62/Bundeswehr_Kreuz_Black.svg
    [4] => http://upload.wikimedia.org/wikipedia/commons/9/99/Flag_of_German_Reich_%281935%E2%80%931945%29.svg
    [5] => http://upload.wikimedia.org/wikipedia/en/6/64/HansUlrichRudel.jpeg
    [6] => http://upload.wikimedia.org/wikipedia/commons/8/82/Heinkel_He_111_during_the_Battle_of_Britain.jpg
    [7] => http://upload.wikimedia.org/wikipedia/commons/6/66/Regulation_WW_II_Underwing_Balkenkreuz.png
)

请注意,第二个数组的第 6 个元素上的 URL 发生了一些变化。这就是 @JosephJaber 在上面的评论中警告的内容。

希望这对某人有帮助。

I'm sorry for not answering specifically your question about the main image. But here's some code to get a list of all images:

function makeCall($url) {
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    return curl_exec($curl);
}

function wikipediaImageUrls($url) {
    $imageUrls = array();
    $pathComponents = explode('/', parse_url($url, PHP_URL_PATH));
    $pageTitle = array_pop($pathComponents);
    $imagesQuery = "http://en.wikipedia.org/w/api.php?action=query&titles={$pageTitle}&prop=images&format=json";
    $jsonResponse = makeCall($imagesQuery);
    $response = json_decode($jsonResponse, true);
    $imagesKey = key($response['query']['pages']);
    foreach($response['query']['pages'][$imagesKey]['images'] as $imageArray) {
        if($imageArray['title'] != 'File:Commons-logo.svg' && $imageArray['title'] != 'File:P vip.svg') {
            $title = str_replace('File:', '', $imageArray['title']);
            $title = str_replace(' ', '_', $title);
            $imageUrlQuery = "http://en.wikipedia.org/w/api.php?action=query&titles=Image:{$title}&prop=imageinfo&iiprop=url&format=json";
            $jsonUrlQuery = makeCall($imageUrlQuery);
            $urlResponse = json_decode($jsonUrlQuery, true);
            $imageKey = key($urlResponse['query']['pages']);
            $imageUrls[] = $urlResponse['query']['pages'][$imageKey]['imageinfo'][0]['url'];
        }
    }
    return $imageUrls;
}
print_r(wikipediaImageUrls('http://en.wikipedia.org/wiki/Saturn_%28mythology%29'));
print_r(wikipediaImageUrls('http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel'));

I got this for http://en.wikipedia.org/wiki/Saturn_%28mythology%29:

Array
(
    [0] => http://upload.wikimedia.org/wikipedia/commons/1/10/Arch_of_SeptimiusSeverus.jpg
    [1] => http://upload.wikimedia.org/wikipedia/commons/8/81/Ivan_Akimov_Saturn_.jpg
    [2] => http://upload.wikimedia.org/wikipedia/commons/d/d7/Lucius_Appuleius_Saturninus.jpg
    [3] => http://upload.wikimedia.org/wikipedia/commons/2/2c/Polidoro_da_Caravaggio_-_Saturnus-thumb.jpg
    [4] => http://upload.wikimedia.org/wikipedia/commons/b/bd/Porta_Maggiore_Alatri.jpg
    [5] => http://upload.wikimedia.org/wikipedia/commons/6/6a/She-wolf_suckles_Romulus_and_Remus.jpg
    [6] => http://upload.wikimedia.org/wikipedia/commons/4/45/Throne_of_Saturn_Louvre_Ma1662.jpg
)

And for the second URL (http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel):

Array
(
    [0] => http://upload.wikimedia.org/wikipedia/commons/e/e9/BmRKEL.jpg
    [1] => http://upload.wikimedia.org/wikipedia/commons/3/3f/BmRKELS.jpg
    [2] => http://upload.wikimedia.org/wikipedia/commons/2/2c/Bundesarchiv_Bild_101I-655-5976-04%2C_Russland%2C_Sturzkampfbomber_Junkers_Ju_87_G.jpg
    [3] => http://upload.wikimedia.org/wikipedia/commons/6/62/Bundeswehr_Kreuz_Black.svg
    [4] => http://upload.wikimedia.org/wikipedia/commons/9/99/Flag_of_German_Reich_%281935%E2%80%931945%29.svg
    [5] => http://upload.wikimedia.org/wikipedia/en/6/64/HansUlrichRudel.jpeg
    [6] => http://upload.wikimedia.org/wikipedia/commons/8/82/Heinkel_He_111_during_the_Battle_of_Britain.jpg
    [7] => http://upload.wikimedia.org/wikipedia/commons/6/66/Regulation_WW_II_Underwing_Balkenkreuz.png
)

Note that the URL changed a bit on the 6th element of the second array. It's what @JosephJaber was warning about in his comment above.

Hope this helps someone.

装纯掩盖桑 2024-12-26 07:34:13

我编写了一些代码,通过维基百科文章标题获取主图像(完整 URL)。它并不完美,但总的来说我对结果非常满意。

挑战在于,当查询特定标题时,维基百科会返回多个图像文件名(不带路径)。此外,二次搜索(我使用了本线程中发布的代码 varatis - 谢谢!)会返回根据搜索到的图像文件名找到的所有图像的 URL,无论原始文章标题如何。毕竟,我们最终可能会得到与搜索无关的通用图像,因此我们将其过滤掉。该代码迭代文件名和 URL,直到找到(希望是最好的)匹配...有点复杂,但它有效:)

关于通用过滤器的注意事项:我一直在为 isGeneric() 编译通用图像字符串列表功能,但列表不断增长。我正在考虑将其保留为公开列表 - 如果有任何兴趣请告诉我。

上一篇:

protected static $baseurl = "http://en.wikipedia.org/w/api.php";

主函数 - 从标题获取图像 URL:

public static function getImageURL($title)
{
    $images = self::getImageFilenameObj($title); // returns JSON object
    if (!$images) return '';

    foreach ($images as $image)
    {
        // get object of image URL for given filename
        $imgjson = self::getFileURLObj($image->title);

        // return first image match
        foreach ($imgjson as $img)
        {
            // get URL for image
            $url = $img->imageinfo[0]->url;

            // no image found               
            if (!$url) continue;

            // filter generic images
            if (self::isGeneric($url)) continue;

            // match found
            return $url;
        }
    }
    // match not found
    return '';          
}

== 上面的主函数调用了以下函数 ==

按标题获取 JSON 对象(文件名):

public static function getImageFilenameObj($title)
{
    try     // see if page has images
    {
        // get image file name
        $json = json_decode(
            self::retrieveInfo(
                self::$baseurl . '?action=query&titles=' .
                urlencode($title) . '&prop=images&format=json'
            ))->query->pages;

        /** The foreach is only to get around
         *  the fact that we don't have the id.
         */
        foreach ($json as $id) { return $id->images; }
    }
    catch(exception $e) // no images
    {
        return NULL;
    }
}   

按文件名获取 JSON 对象(URL):

public static function getFileURLObj($filename)
{
    try                     // resolve URL from filename
    {
        return json_decode(
            self::retrieveInfo(
                self::$baseurl . '?action=query&titles=' .
                urlencode($filename) . '&prop=imageinfo&iiprop=url&format=json'
            ))->query->pages;
    }
    catch(exception $e)     // no URLs
    {
        return NULL;
    }
}   

过滤掉通用图像:

public static function isGeneric($url)
{
    $generic_strings = array(
        '_gray.svg',
        'icon',
        'Commons-logo.svg',
        'Ambox',
        'Text_document_with_red_question_mark.svg',
        'Question_book-new.svg',
        'Canadese_kano',
        'Wiki_letter_',
        'Edit-clear.svg',
        'WPanthroponymy',
        'Compass_rose_pale',
        'Us-actor.svg',
        'voting_box',
        'Crystal_',
        'transportation_inv',
        'arrow.svg',
        'Quill_and_ink-US.svg',
        'Decrease2.svg',
        'Rating-',
        'template',
        'Nuvola_apps_',
        'Mergefrom.svg',
        'Portal-',
        'Translation_to_',
        '/School.svg',
        'arrow',
        'Symbol_',
        'stub',
        'Unbalanced_scales.svg',
        '-logo.',
        'P_vip.svg',
        'Books-aj.svg_aj_ashton_01.svg',
        'Film',
        '/Gnome-',
        'cap.svg',
        'Missing',
        'silhouette',
        'Star_empty.svg',
        'Music_film_clapperboard.svg',
        'IPA_Unicode',
        'symbol',
        '_highlighting_',
        'pictogram',
        'Red_pog.svg',
        '_medal_with_cup',
        '_balloon',
        'Feature',
        'Aiga_'
    );

    foreach ($generic_strings as $str)
    {
        if (stripos($url, $str) !== false) return true;
    }

    return false;
}

欢迎评论。

I have written some code that gets main image (full URL) by Wikipedia article title. It's not perfect, but overall I'm very pleased with the results.

The challenge was that when queried for a specific title, Wikipedia returns multiple image filenames (without path). Furthermore, the secondary search (I used the code varatis posted in this thread - thanks!) returns URLs of all images found based on the image filename that was searched, regardless of the original article title. After all this, we may end up with a generic image irrelevant to the search, so we filter those out. The code iterates over filenames and URLs until it finds (hopefully the best) match... a bit complicated, but it works :)

Note on the generic filter: I've been compiling a list of generic image strings for the isGeneric() function, but the list just keeps growing. I am considering maintaining it as a public list - if there is any interest let me know.

Pre:

protected static $baseurl = "http://en.wikipedia.org/w/api.php";

Main function - get image URL from title:

public static function getImageURL($title)
{
    $images = self::getImageFilenameObj($title); // returns JSON object
    if (!$images) return '';

    foreach ($images as $image)
    {
        // get object of image URL for given filename
        $imgjson = self::getFileURLObj($image->title);

        // return first image match
        foreach ($imgjson as $img)
        {
            // get URL for image
            $url = $img->imageinfo[0]->url;

            // no image found               
            if (!$url) continue;

            // filter generic images
            if (self::isGeneric($url)) continue;

            // match found
            return $url;
        }
    }
    // match not found
    return '';          
}

== The following functions are called by the main function above ==

Get JSON object (filenames) by title:

public static function getImageFilenameObj($title)
{
    try     // see if page has images
    {
        // get image file name
        $json = json_decode(
            self::retrieveInfo(
                self::$baseurl . '?action=query&titles=' .
                urlencode($title) . '&prop=images&format=json'
            ))->query->pages;

        /** The foreach is only to get around
         *  the fact that we don't have the id.
         */
        foreach ($json as $id) { return $id->images; }
    }
    catch(exception $e) // no images
    {
        return NULL;
    }
}   

Get JSON object (URLs) by filename:

public static function getFileURLObj($filename)
{
    try                     // resolve URL from filename
    {
        return json_decode(
            self::retrieveInfo(
                self::$baseurl . '?action=query&titles=' .
                urlencode($filename) . '&prop=imageinfo&iiprop=url&format=json'
            ))->query->pages;
    }
    catch(exception $e)     // no URLs
    {
        return NULL;
    }
}   

Filter out generic images:

public static function isGeneric($url)
{
    $generic_strings = array(
        '_gray.svg',
        'icon',
        'Commons-logo.svg',
        'Ambox',
        'Text_document_with_red_question_mark.svg',
        'Question_book-new.svg',
        'Canadese_kano',
        'Wiki_letter_',
        'Edit-clear.svg',
        'WPanthroponymy',
        'Compass_rose_pale',
        'Us-actor.svg',
        'voting_box',
        'Crystal_',
        'transportation_inv',
        'arrow.svg',
        'Quill_and_ink-US.svg',
        'Decrease2.svg',
        'Rating-',
        'template',
        'Nuvola_apps_',
        'Mergefrom.svg',
        'Portal-',
        'Translation_to_',
        '/School.svg',
        'arrow',
        'Symbol_',
        'stub',
        'Unbalanced_scales.svg',
        '-logo.',
        'P_vip.svg',
        'Books-aj.svg_aj_ashton_01.svg',
        'Film',
        '/Gnome-',
        'cap.svg',
        'Missing',
        'silhouette',
        'Star_empty.svg',
        'Music_film_clapperboard.svg',
        'IPA_Unicode',
        'symbol',
        '_highlighting_',
        'pictogram',
        'Red_pog.svg',
        '_medal_with_cup',
        '_balloon',
        'Feature',
        'Aiga_'
    );

    foreach ($generic_strings as $str)
    {
        if (stripos($url, $str) !== false) return true;
    }

    return false;
}

Comments welcome.

表情可笑 2024-12-26 07:34:13

让我们以页面 http://en.wikipedia.org/wiki/index 为例。 html?curid=57570
获取主图

查看

属性=页面属性

action=query&pageids=57570&prop=pageprops&format=json

结果页面数据例如。

{ "pages" : { "57570":{
                    "pageid":57570,
                    "ns":0,
                    "title":"Sachin Tendulkar",
                    "pageprops" : {
                         "defaultsort":"Tendulkar,Sachin",
                         "page_image":"Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg",
                         "wikibase_item":"Q9488"
                    }
            }
          }
 }}

我们得到主图片文件名这个结果为

** (wikiId).pageprops.page_image = Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg**

现在,由于我们有了图像文件名,因此我们必须进行另一个 Api 调用以从文件名获取完整图像路径,如下所示

action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url

例如。

action=query&titles=图片:Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg&prop=imageinfo&iiprop=url

返回图像数据数组,其中 url 为
http://upload.wikimedia.org/wikipedia/commons/3/35/Sachin_at_Castrol_Golden_Spanner_Awards_%28crop%29.jpg

Lets take Example of Page http://en.wikipedia.org/wiki/index.html?curid=57570
to get Main Pic

Check out

prop=pageprops

action=query&pageids=57570&prop=pageprops&format=json

Results Page Data Eg.

{ "pages" : { "57570":{
                    "pageid":57570,
                    "ns":0,
                    "title":"Sachin Tendulkar",
                    "pageprops" : {
                         "defaultsort":"Tendulkar,Sachin",
                         "page_image":"Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg",
                         "wikibase_item":"Q9488"
                    }
            }
          }
 }}

We get main Pic file name this result as

** (wikiId).pageprops.page_image = Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg**

Now as we have Image file name we will have to make another Api Call to get full image path from file name as follows

action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url

Eg.

action=query&titles=Image:Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg&prop=imageinfo&iiprop=url

Returns Array of Image Data having url in it as
http://upload.wikimedia.org/wikipedia/commons/3/35/Sachin_at_Castrol_Golden_Spanner_Awards_%28crop%29.jpg

捶死心动 2024-12-26 07:34:13

我有一种方法可以可靠地获取维基百科页面的主图像 - 名为 PageImages 的扩展

PageImages 扩展程序收集有关页面上使用的图像的信息。

其目的是返回关联的单个最合适的缩略图
对于一篇文章,尝试仅返回有意义的图像,例如不返回
来自维护模板、存根或标志图标的内容。目前它
使用页面中使用的第一个无意义的图像。

https://www.mediawiki.org/wiki/Extension:PageImages

只需添加 prop pageimages 到您的 API 查询:

/w/api.php?action=query&prop=pageimages&titles=Somepage&format=xml

这可以可靠地过滤掉烦人的默认图像,并防止您必须自己过滤它们!该扩展已安装在所有主要维基百科页面上...

I there is a way to reliably get a main image for a wikipedia page - the Extension called PageImages

The PageImages extension collects information about images used on a page.

Its aim is to return the single most appropriate thumbnail associated
with an article, attempting to return only meaningful images, e.g. not
those from maintenance templates, stubs or flag icons. Currently it
uses the first non-meaningless image used in the page.

https://www.mediawiki.org/wiki/Extension:PageImages

Just add the prop pageimages to your API Query:

/w/api.php?action=query&prop=pageimages&titles=Somepage&format=xml

This reliably filters out annoying default images and prevents you from having to filter them yourself! The extension is installed on all the main wikipedia pages...

拒绝两难 2024-12-26 07:34:13

就像 Anuraj 提到的,pageimages 参数就是它。看看下面的网址,它会带来一些漂亮的东西:

https://en.wikipedia.org/w/api.php?action=query&prop=info|extracts|pageimages|images&inprop=url&exsentences=1&titles=india

她有一些有趣的参数:

  • 两个参数 extractsexsentences 为您提供了一个简短的内容
    您可以使用的描述。 (exsentences 是您要在摘录中包含的句子数)
  • info 和 inprop=url 参数为您提供页面的 url
  • prop 属性具有多个由条形符号分隔的参数
  • 如果你在其中插入 format=json ,效果更好

Like Anuraj mentioned, the pageimages parameter is it. Look at the following url that'll bring about some nifty stuff:

https://en.wikipedia.org/w/api.php?action=query&prop=info|extracts|pageimages|images&inprop=url&exsentences=1&titles=india

Her are some interesting parameters:

  • The two parameters extracts and exsentences gives you a short
    description you can use. (exsentences is the number of sentences you want to include in the excerpt)
  • The info and the inprop=url parameters gives you the url of the page
  • The prop property has multiple parameters separated by a bar symbol
  • And if you insert the format=json in there, it is even better
星光不落少年眉 2024-12-26 07:34:13

请参阅有关 Wikipedia API 的此相关问题。但是,我不知道是否可以通过 API 检索缩略图。

您还可以考虑仅解析网页来查找图像 URL,并以这种方式检索图像。

See this related question on an API for Wikipedia. However, I would not know if it is possible to retrieve the thumbnail picture through an API.

You can also consider just parsing the web page to find the image URL, and retrieve the image that way.

叫嚣ゝ 2024-12-26 07:34:13

这是我发现 95% 的文章都适用的 XPath 列表。主要是 1、2、3 和 4。很多文章的格式不正确,这些都是边缘情况:

您可以使用 DOM 解析库来使用 XPath 获取图像。

static NSString   *kWikipediaImageXPath2    =   @"//*[@id=\"mw-content-text\"]/div[1]/div/table/tr[2]/td/a/img";
static NSString   *kWikipediaImageXPath3    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[1]/td/a/img";
static NSString   *kWikipediaImageXPath1    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/a/img";
static NSString   *kWikipediaImageXPath4    =   @"//*[@id=\"mw-content-text\"]/div[2]/table/tr[2]/td/a/img";
static NSString   *kWikipediaImageXPath5    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/p/a/img";
static NSString   *kWikipediaImageXPath6    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/div/div/a/img";
static NSString   *kWikipediaImageXPath7    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[1]/td/div/div/a/img";

我在 libxml2.2 周围使用了一个名为 Hpple 的 ObjC 包装器来提取图像 url。希望这有帮助

Here is my list of XPaths I have found work for 95 percent of articles. the main ones are 1, 2 3 and 4. A lot of articles are not formatted correctly and these would be edge cases:

You can use a DOM parsing lib to fetch image using the XPath.

static NSString   *kWikipediaImageXPath2    =   @"//*[@id=\"mw-content-text\"]/div[1]/div/table/tr[2]/td/a/img";
static NSString   *kWikipediaImageXPath3    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[1]/td/a/img";
static NSString   *kWikipediaImageXPath1    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/a/img";
static NSString   *kWikipediaImageXPath4    =   @"//*[@id=\"mw-content-text\"]/div[2]/table/tr[2]/td/a/img";
static NSString   *kWikipediaImageXPath5    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/p/a/img";
static NSString   *kWikipediaImageXPath6    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/div/div/a/img";
static NSString   *kWikipediaImageXPath7    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[1]/td/div/div/a/img";

I used a ObjC wrapper called Hpple around libxml2.2 to pull out the image url. Hope this helps

枉心 2024-12-26 07:34:13

您还可以使用名为 SDWebImage 的 cocoa Pod

代码示例(记住还要添加 import SDWebImage):

func requestInfo(flowerName: String) {

        let parameters : [String:String] = [
            "format" : "json",
            "action" : "query",
            "prop" : "extracts|pageimages",//pageimages allows fetch imagePath
            "exintro" : "",
            "explaintext" : "",
            "titles" : flowerName,
            "indexpageids" : "",
            "redirects" : "1",
            "pithumbsize" : "500"//specify image size in px
        ]


        AF.request(wikipediaURL, method: .get, parameters: parameters).responseJSON { (response) in
            switch response.result {
            case .success(let value):
                print("Got the wikipedia info.")
                print(response)

                let flowerJSON : JSON = JSON(response.value!)
                let pageid = flowerJSON["query"]["pageids"][0].stringValue

                let flowerDescription = flowerJSON["query"]["pages"][pageid]["extract"].stringValue

                let flowerImageURL = flowerJSON["query"]["pages"][pageid]["thumbnail"]["source"].stringValue //fetching Image URL

                self.wikiInfoLabel.text = flowerDescription
                self.imageView.sd_setImage(with: URL(string : flowerImageURL))//imageView updated with Wiki Image

            case .failure(let error):
                print(error)
            }
        }
    }

You can also use cocoa Pod called SDWebImage

Code sample (remember to also add import SDWebImage):

func requestInfo(flowerName: String) {

        let parameters : [String:String] = [
            "format" : "json",
            "action" : "query",
            "prop" : "extracts|pageimages",//pageimages allows fetch imagePath
            "exintro" : "",
            "explaintext" : "",
            "titles" : flowerName,
            "indexpageids" : "",
            "redirects" : "1",
            "pithumbsize" : "500"//specify image size in px
        ]


        AF.request(wikipediaURL, method: .get, parameters: parameters).responseJSON { (response) in
            switch response.result {
            case .success(let value):
                print("Got the wikipedia info.")
                print(response)

                let flowerJSON : JSON = JSON(response.value!)
                let pageid = flowerJSON["query"]["pageids"][0].stringValue

                let flowerDescription = flowerJSON["query"]["pages"][pageid]["extract"].stringValue

                let flowerImageURL = flowerJSON["query"]["pages"][pageid]["thumbnail"]["source"].stringValue //fetching Image URL

                self.wikiInfoLabel.text = flowerDescription
                self.imageView.sd_setImage(with: URL(string : flowerImageURL))//imageView updated with Wiki Image

            case .failure(let error):
                print(error)
            }
        }
    }
多情癖 2024-12-26 07:34:13

我认为不是,但您可以使用链接解析器 HTML 文档捕获图像

I think not, but you can capture the image using a link parser HTML documents

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文