维基百科 API 只返回一小部分数据?
嘿, 我正在尝试使用其 API (http://en.wikipedia.org /w/api.php)来自 PHP 脚本,但我似乎总是只获得真实内容的一小部分。 例如,在尝试时:
$page=get_web_page("http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links&format=txt");
echo $page["content"];
这就是我得到的:
Array ( [query] => Array ( [pages] => Array ( [6678] => Array ( [pageid] => 6678 [ns] => 0 [title] => Cat [links] => Array ( [0] => Array ( [ns] => 0 [title] => 10th edition of Systema Naturae ) [1] => Array ( [ns] => 0 [title] => 3-mercapto-3-methylbutan-1-ol ) [2] => Array ( [ns] => 0 [title] => Abyssinian (cat) ) [3] => Array ( [ns] => 0 [title] => Actinidia polygama ) [4] => Array ( [ns] => 0 [title] => Adaptive radiation ) [5] => Array ( [ns] => 0 [title] => African Wildcat ) [6] => Array ( [ns] => 0 [title] => African wildcat ) [7] => Array ( [ns] => 0 [title] => Afro-Asiatic languages ) [8] => Array ( [ns] => 0 [title] => Age of Discovery ) [9] => Array ( [ns] => 0 [title] => Agouti signalling peptide ) ) ) ) ) [query-continue] => Array ( [links] => Array ( [plcontinue] => 6678|0|Albino ) ) )
我请求“Cat”文章上的链接的完整列表,但我似乎只获得按字母顺序排列的前 10 个链接。 无论我选择哪种格式,甚至是 API 本身,都会发生这种情况(请参阅 http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links)。 是什么导致了这种限制,我该如何解决它?
Hey there,
I'm trying to extract data from Wikipedia articles using its API (http://en.wikipedia.org/w/api.php) from a PHP script, but I always only seem to get a fraction of the real content.
For example, when trying :
$page=get_web_page("http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links&format=txt");
echo $page["content"];
This is what I get :
Array ( [query] => Array ( [pages] => Array ( [6678] => Array ( [pageid] => 6678 [ns] => 0 [title] => Cat [links] => Array ( [0] => Array ( [ns] => 0 [title] => 10th edition of Systema Naturae ) [1] => Array ( [ns] => 0 [title] => 3-mercapto-3-methylbutan-1-ol ) [2] => Array ( [ns] => 0 [title] => Abyssinian (cat) ) [3] => Array ( [ns] => 0 [title] => Actinidia polygama ) [4] => Array ( [ns] => 0 [title] => Adaptive radiation ) [5] => Array ( [ns] => 0 [title] => African Wildcat ) [6] => Array ( [ns] => 0 [title] => African wildcat ) [7] => Array ( [ns] => 0 [title] => Afro-Asiatic languages ) [8] => Array ( [ns] => 0 [title] => Age of Discovery ) [9] => Array ( [ns] => 0 [title] => Agouti signalling peptide ) ) ) ) ) [query-continue] => Array ( [links] => Array ( [plcontinue] => 6678|0|Albino ) ) )
I was requesting the full list of links on the "Cat" article, but I only seem to get the first 10 in alphabetic order.
This happens no matter the format I choose and even from the API itself (see http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links).
What is causing this restriction, and how can I fix it ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您查看 API 手册,您会看到是一个
pllimit
选项,它指定要发送的链接数量。如果您有机器人帐户,您可以一次性获得 500 个或 5000 个。您将在数据转储的末尾看到您提供的以下内容:
[plcontinue] => 6678|0|白化病)。您可以向服务器提供此信息,并从该点开始从该页面获取更多链接。因此,您进行的下一个查询将是
您需要继续执行此操作,直到服务器不返回
plcontinue
值。If you look at the API manual, you will see that there is a
pllimit
option, which specifies how many links you want to be sent. You can get 500, or 5000 if you have a bot account, at one time.You will see at the end of the data dump you provided the following:
[plcontinue] => 6678|0|Albino )
. You can provide this information to the server and get back more links from the page, starting from that point. So the next query you make would beYou will need to keep doing this until the server does not return a
plcontinue
value.