维基百科 API 只返回一小部分数据?

发布于 2024-10-11 20:21:30 字数 1591 浏览 3 评论 0原文

嘿, 我正在尝试使用其 API (http://en.wikipedia.org /w/api.php)来自 PHP 脚本,但我似乎总是只获得真实内容的一小部分。 例如,在尝试时:

$page=get_web_page("http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links&format=txt");
echo $page["content"];

这就是我得到的:

Array ( [query] => Array ( [pages] => Array ( [6678] => Array ( [pageid] => 6678 [ns] => 0 [title] => Cat [links] => Array ( [0] => Array ( [ns] => 0 [title] => 10th edition of Systema Naturae ) [1] => Array ( [ns] => 0 [title] => 3-mercapto-3-methylbutan-1-ol ) [2] => Array ( [ns] => 0 [title] => Abyssinian (cat) ) [3] => Array ( [ns] => 0 [title] => Actinidia polygama ) [4] => Array ( [ns] => 0 [title] => Adaptive radiation ) [5] => Array ( [ns] => 0 [title] => African Wildcat ) [6] => Array ( [ns] => 0 [title] => African wildcat ) [7] => Array ( [ns] => 0 [title] => Afro-Asiatic languages ) [8] => Array ( [ns] => 0 [title] => Age of Discovery ) [9] => Array ( [ns] => 0 [title] => Agouti signalling peptide ) ) ) ) ) [query-continue] => Array ( [links] => Array ( [plcontinue] => 6678|0|Albino ) ) ) 

我请求“Cat”文章上的链接的完整列表,但我似乎只获得按字母顺序排列的前 10 个链接。 无论我选择哪种格式,甚至是 API 本身,都会发生这种情况(请参阅 http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links)。 是什么导致了这种限制,我该如何解决它?

Hey there,
I'm trying to extract data from Wikipedia articles using its API (http://en.wikipedia.org/w/api.php) from a PHP script, but I always only seem to get a fraction of the real content.
For example, when trying :

$page=get_web_page("http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links&format=txt");
echo $page["content"];

This is what I get :

Array ( [query] => Array ( [pages] => Array ( [6678] => Array ( [pageid] => 6678 [ns] => 0 [title] => Cat [links] => Array ( [0] => Array ( [ns] => 0 [title] => 10th edition of Systema Naturae ) [1] => Array ( [ns] => 0 [title] => 3-mercapto-3-methylbutan-1-ol ) [2] => Array ( [ns] => 0 [title] => Abyssinian (cat) ) [3] => Array ( [ns] => 0 [title] => Actinidia polygama ) [4] => Array ( [ns] => 0 [title] => Adaptive radiation ) [5] => Array ( [ns] => 0 [title] => African Wildcat ) [6] => Array ( [ns] => 0 [title] => African wildcat ) [7] => Array ( [ns] => 0 [title] => Afro-Asiatic languages ) [8] => Array ( [ns] => 0 [title] => Age of Discovery ) [9] => Array ( [ns] => 0 [title] => Agouti signalling peptide ) ) ) ) ) [query-continue] => Array ( [links] => Array ( [plcontinue] => 6678|0|Albino ) ) ) 

I was requesting the full list of links on the "Cat" article, but I only seem to get the first 10 in alphabetic order.
This happens no matter the format I choose and even from the API itself (see http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links).
What is causing this restriction, and how can I fix it ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

讽刺将军 2024-10-18 20:21:30

如果您查看 API 手册,您会看到是一个 pllimit 选项,它指定要发送的链接数量。如果您有机器人帐户,您可以一次性获得 500 个或 5000 个。

您将在数据转储的末尾看到您提供的以下内容:[plcontinue] => 6678|0|白化病)。您可以向服务器提供此信息,并从该点开始从该页面获取更多链接。因此,您进行的下一个查询将是

$page=get_web_page("http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links&format=txt&plcontinue=6678|0|Albino");

您需要继续执行此操作,直到服务器不返回 plcontinue 值。

If you look at the API manual, you will see that there is a pllimit option, which specifies how many links you want to be sent. You can get 500, or 5000 if you have a bot account, at one time.

You will see at the end of the data dump you provided the following: [plcontinue] => 6678|0|Albino ). You can provide this information to the server and get back more links from the page, starting from that point. So the next query you make would be

$page=get_web_page("http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links&format=txt&plcontinue=6678|0|Albino");

You will need to keep doing this until the server does not return a plcontinue value.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文