我如何从 mediwiki 获取数据

发布于 2024-11-09 14:00:41 字数 907 浏览 1 评论 0原文

您好,我正在使用以下 api 从 mediawiki 获取数据。当我复制此 url 并将其粘贴到浏览器中时,会出现 xml 响应。 http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main_Page&rvprop=timestamp|user|comment|content

但是当我尝试使用curl时,它给我错误“脚本应该使用包含联系信息的信息性用户代理字符串,否则它们可能会在没有通知的情况下被IP阻止。”。

我为此使用以下代码。任何人都可以追踪我的错误吗?

$url='http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main_Page&rvprop=timestamp|user|comment|content';
$curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, $url); 
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
        //curl_setopt($curl, CURLOPT_TIMEOUT, 1); 
        $objResponse = curl_exec($curl);
        curl_close($curl);

        echo $objResponse;die;

Hi I am using following api to get the data from mediawiki. When I copy this url and paste it into a browser, an xml response appears.
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main_Page&rvprop=timestamp|user|comment|content

but when I try to do with curl it gives me the error "Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice. ".

I am using following code for this. Can any one trace my error?

$url='http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main_Page&rvprop=timestamp|user|comment|content';
$curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, $url); 
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
        //curl_setopt($curl, CURLOPT_TIMEOUT, 1); 
        $objResponse = curl_exec($curl);
        curl_close($curl);

        echo $objResponse;die;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

寻找一个思念的角度 2024-11-16 14:00:41

这将绕过引荐来源网址用户代理检查:

    <?php


    function getwiki($url="", $referer="", $userAgent="") {
        if($url==""||$referer==""||$userAgent=="") { return false;};
        $headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
        $headers[] = 'Connection: Keep-Alive';
        $headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8';
        $user_agent = $userAgent;
        $process = curl_init($url);
        curl_setopt($process, CURLOPT_HTTPHEADER, $headers);
        curl_setopt($process, CURLOPT_HEADER, 0);
        curl_setopt($process, CURLOPT_USERAGENT, $user_agent);
        curl_setopt($process, CURLOPT_REFERER, $referer);
        curl_setopt($process, CURLOPT_TIMEOUT, 30);
        curl_setopt($process, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1);
        $return = curl_exec($process);
        curl_close($process);
        return $return;
    }

    //edited to include Adam Backstrom's sound advice
    echo getwiki('http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main_Page&rvprop=timestamp|user|comment|content', 'http://en.wikipedia.org/', 'Mozilla/5.0 (compatible; YourCoolBot/1.0; +http://yoursite.com/botinfo)');

    ?>

this will work to bypass there referrer user agent checks:

    <?php


    function getwiki($url="", $referer="", $userAgent="") {
        if($url==""||$referer==""||$userAgent=="") { return false;};
        $headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
        $headers[] = 'Connection: Keep-Alive';
        $headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8';
        $user_agent = $userAgent;
        $process = curl_init($url);
        curl_setopt($process, CURLOPT_HTTPHEADER, $headers);
        curl_setopt($process, CURLOPT_HEADER, 0);
        curl_setopt($process, CURLOPT_USERAGENT, $user_agent);
        curl_setopt($process, CURLOPT_REFERER, $referer);
        curl_setopt($process, CURLOPT_TIMEOUT, 30);
        curl_setopt($process, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1);
        $return = curl_exec($process);
        curl_close($process);
        return $return;
    }

    //edited to include Adam Backstrom's sound advice
    echo getwiki('http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main_Page&rvprop=timestamp|user|comment|content', 'http://en.wikipedia.org/', 'Mozilla/5.0 (compatible; YourCoolBot/1.0; +http://yoursite.com/botinfo)');

    ?>
瑾夏年华 2024-11-16 14:00:41

来自 MediaWiki API:快速入门指南

传递一个正确标识您的客户端的 User-Agent 标头:不要使用客户端库中的默认 User-Agent,而是使用包含客户端名称和版本号的自定义用户代理,例如 MyCuteBot/0.1 .

在 Wikimedia wiki 上,未能提供 User-Agent 标头或提供空或通用标头将导致请求失败并出现 HTTP 403 错误。请参阅元:用户代理策略。其他 MediaWiki 维基可能有类似的政策。

来自 meta:User-Agent 策略

如果您运行机器人,请发送一个 User-Agent 标头,用于识别机器人并提供某种联系方式,例如:User-Agent: MyCoolTool (+http://example.com/MyCoolToolPage/)

From the MediaWiki API:Quick start guide:

Pass a User-Agent header that properly identifies your client: don't use the default User-Agent from your client library, but use a custom one including the name of your client and the version number, something like MyCuteBot/0.1.

On Wikimedia wikis, failing to supply a User-Agent header or supplying an empty or generic one will cause the request to fail with an HTTP 403 error. See meta:User-Agent policy. Other MediaWiki wikis may have similar policies.

From meta:User-Agent policy:

If you run a bot, please send a User-Agent header identifying the bot and supplying some way of contacting you, e.g.: User-Agent: MyCoolTool (+http://example.com/MyCoolToolPage/)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文