在 wiki commons 中检索图像许可和作者信息

发布于 2024-12-05 10:56:12 字数 435 浏览 4 评论 0 原文

我正在尝试将 wikimedia API 用于 wiki commons:

http://commons.wikimedia.org/w/api.php

似乎 commons API 非常不成熟,并且其部分 提到检索许可证和作者信息的可能性的文档为空。

我是否可以检索包含有关使用 API 的许可信息的段落? (例如,本页标题“许可”下的段落)。当然,我可以下载整个页面并尝试解析它,但是 API 有什么用呢?

I am trying to use the wikimedia API for wiki commons at:

http://commons.wikimedia.org/w/api.php

It seems like the commons API is very immature and the part at their document that mentions the possibility to retrieve license and author information is empty.

Is there anyway I can retrieve the paragraph that contains the information about the licensing using the API? (For example, the paragraph under the title "Licensing" at this page). Of course I can download the whole page and try to parse it, but what are APIs for?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

手心的温暖 2024-12-12 10:56:12

迟到的答复,但您可以使用以下查询请求“extmetadata”数据:

http://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=extmetadata&titles=File%3aBrad_Pitt_at_Incirlik2.jpg&format=json

看看在 imageinfo.extmetadata.UsageTerms、艺术家、信用等下。

Late answer but you can request the "extmetadata" data with the following query:

http://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=extmetadata&titles=File%3aBrad_Pitt_at_Incirlik2.jpg&format=json

Look under imageinfo.extmetadata.UsageTerms, Artist, Credit, etc.

猫腻 2024-12-12 10:56:12

您可以尝试在 维基媒体工具服务器。它不是官方服务,而且文档似乎相当稀疏​​(也就是说,几乎不存在),但是 XML 输出 似乎非常不言自明。

我似乎无法在任何地方找到 Magnus 脚本的源代码,但我假设它从 文件所属的类别。如果您愿意,您可以自己执行此操作:只需获取类别列表,并在必要时沿着类别树向上查找,直到找到 您认可的许可类别。唉,树遍历部分需要多个 API 请求或 Commons 类别数据库(工具服务器上的实时访问,或来自 数据库转储)。

是的,我意识到这个答案可能看起来并不令人满意。事实上,马格努斯的脚本似乎是目前最接近你想要的东西,甚至它被标记为实验性的和不完整的。基本上,这是一个等待有人实施(更好)解决方案的问题。

You could try using Magnus Manske's Commons API tool on the Wikimedia Toolserver. It's not an official service, and the documentation seem to be rather sparse (that is to say, almost nonexistent), but the XML output seems pretty self-explanatory.

I can't seem to find the source for Magnus's script anywhere, but I assume it extracts the licensing information from the categories the file belongs to. If you wanted, you could do that yourself: just fetch the list of categories and, if necessary, walk up the category tree until you find a license category you recognize. Alas, the tree-walking part requires either multiple API requests or a database of Commons categories (either live access on the Toolserver, or a reconstructed copy from the database dumps).

Yes, I realize that this answer may seem unsatisfactory. The fact is that Magnus's script seems to be the closest currently existing thing to what you want, and even it's marked as experimental and incomplete. Basically, this is a problem waiting for someone to implement a (better) solution.

回心转意 2024-12-12 10:56:12

看看 Mediawiki
并尝试这个函数:

import json, requests
def extract_image_license(image_name):

    start_of_end_point_str = 'https://commons.wikimedia.org' \
                         '/w/api.php?action=query&titles=File:'
    end_of_end_point_str = '&prop=imageinfo&iiprop=user' \
                       '|userid|canonicaltitle|url|extmetadata&format=json'
    result = requests.get(start_of_end_point_str + image_name+end_of_end_point_str)
    result = result.json()
    page_id = next(iter(result['query']['pages']))
    image_info = result['query']['pages'][page_id]['imageinfo']

    return image_info

然后调用该函数并传入要查询的图像名称,例如:

extract_image_license('Albert_Einstein_Head.jpg')

have a look at Mediawiki
and try this function:

import json, requests
def extract_image_license(image_name):

    start_of_end_point_str = 'https://commons.wikimedia.org' \
                         '/w/api.php?action=query&titles=File:'
    end_of_end_point_str = '&prop=imageinfo&iiprop=user' \
                       '|userid|canonicaltitle|url|extmetadata&format=json'
    result = requests.get(start_of_end_point_str + image_name+end_of_end_point_str)
    result = result.json()
    page_id = next(iter(result['query']['pages']))
    image_info = result['query']['pages'][page_id]['imageinfo']

    return image_info

then you call the function and pass in the image name you want to query for example:

extract_image_license('Albert_Einstein_Head.jpg')
朦胧时间 2024-12-12 10:56:12

我使用过 Magnus 的 Commons API 工具。它并不是为了简单地放入项目中而设计的,但是如果您复制它调用的 wiki 页面的源代码并将其缓存在本地,然后将逻辑移动到一个类中,您可以使其更容易调用。 这是 Magnus 版本的来源。如果您想要我从中创建的课程,请告诉我,我会将其挖掘出来。

I've used Magnus' Commons API tool. It's not designed to be just dropped into a project, but if you copy the source of the wiki page it calls and cache it locally, then move the logic into a class you can make it more easily callable. Here's the source for Magnus' version. If you want the class I created from it let me know and I'll dig it out.

青柠芒果 2024-12-12 10:56:12

来自 http://www.mediawiki.org/wiki/API_talk:Main_page#Image_license_information
有没有办法通过api获取图片的许可?
按类别可能是最简单的,假设站点按许可证进行分类。但没有用于许可证信息的内置模块。 Splarka 08:45, 2010 年 1 月 22 日 (UTC)

但是,我发现使用类别不会为许多图像返回任何内容,即使它们指定了许可证。也许最好的方法是解析图像页面的渲染 html。

From http://www.mediawiki.org/wiki/API_talk:Main_page#Image_license_information
Is there a way to get the license of an image through the api?
By category is probably easiest, assuming the site categorizes by license. There is no built in module though for license information. Splarka 08:45, 22 January 2010 (UTC)

However, I find that using categories doesn't return anything for many images even though they have a license specified. Maybe the best way is to parse the rendered html of the image page.

醉殇 2024-12-12 10:56:12

参阅页面:http://www.mediawiki.org/wiki/API:Meta

请 可以对每个图像使用标签“meta=siteinfo”和标签“siprop=rightsinfo”(siprop 是 siteinfo 的 prop)
然后你会看到图片的版权信息。

对于布拉德·皮特来说,它会像:

http://en.wikipedia.org/w/api.php?format=jsonfm&action=query&titles=File:Brad_Pitt_at_Incirlik2.jpg&prop=imageinfo&iiprop=url&meta=siteinfo& siprop=版权信息

see page: http://www.mediawiki.org/wiki/API:Meta

You can use foreach image the tag 'meta=siteinfo' and the tag 'siprop=rightsinfo' (siprop is the prop of the siteinfo)
Then you will see the rightsinfo of the picture.

In your case of Brad Pitt it would be like:

http://en.wikipedia.org/w/api.php?format=jsonfm&action=query&titles=File:Brad_Pitt_at_Incirlik2.jpg&prop=imageinfo&iiprop=url&meta=siteinfo&siprop=rightsinfo

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文