使用curl获取类似盒子插件的facebook内容

发布于 2024-12-14 15:17:14 字数 3757 浏览 5 评论 0原文

我正在开发一个网站，该网站应该对 Facebook.com 被禁止的地方的用户完全可见。所以我的 Facebook like box 插件不会为他们出现。（为了不本地化这个问题，假设我想绕过所有客户端防火墙并在我的网站中将 box 插件显示为简单的 HTML（我的网站在那里没有被禁止））。

我的服务器可以访问 Facebook.com，我认为我可以使用curl（在我的服务器计算机中）获取插件的内容，然后在我网站的任何部分执行并显示该页面的内容作为简单的 HTML。所以我只是写了以下脚本：

<?
$c = curl_init('https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&amp;width=292&amp;height=258&amp;colorscheme=dark&amp;show_faces=true&amp;border_color&amp;stream=false&amp;header=false');

curl_setopt($ch, CURLOPT_HTTPHEADER, array('Host: www.facebook.com', 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
', 'Accept-Language: en-us,en;q=0.5', 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7', 'Accept-Encoding: gzip, deflate'));
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0");

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt(CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);



$html = curl_exec($c);

if (curl_error($c))
    die(curl_error($c));

$status = curl_getinfo($c, CURLINFO_HTTP_CODE);

curl_close($c);
?>

令人惊讶的是，上面的代码适用于 https://www.youtube.com( 这是被禁止的也有 ) 或 https://www.google.com 但无法使用该网址，甚至只是 https://www.facebook.com 在我的服务器中。

另一个问题：如果我使用 https://www.youtube.com 而不是 Facebook.com 我仍然不能获取 YouTube.com 中使用的 CSS 文件或 Javascript 文件（因为它们也被禁止，客户端也无法下载）。我只能看到文字和一些图像。我还希望curl能够自动获取CSS和Javascript文件的内容。

我还使用 YQL 来从 Facebook.com 获取 like box 插件的内容，但得到以下结果：

YQL 语句：

select * from html where url = 'https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&amp;width=292&amp;height=258&amp;colorscheme=dark&amp;show_faces=true&amp;border_color&amp;stream=false&amp;header=false'

结果：

    <?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
    yahoo:count="0" yahoo:created="2011-11-11T11:41:10Z" yahoo:lang="en-US">
    <diagnostics>
        <publiclyCallable>true</publiclyCallable>
        <url
            error="Redirected to a robots.txt restricted URL: https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&amp;amp;width=292&amp;amp;height=258&amp;amp;colorscheme=dark&amp;amp;show_faces=true&amp;amp;border_color&amp;amp;stream=false&amp;amp;header=false"
            execution-start-time="1" execution-stop-time="6"
            execution-time="5" http-status-code="403"
            http-status-message="Forbidden" proxy="DEFAULT"><![CDATA[https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&amp;width=292&amp;height=258&amp;colorscheme=dark&amp;show_faces=true&amp;border_color&amp;stream=false&amp;header=false]]></url>
        <user-time>6</user-time>
        <service-time>5</service-time>
        <build-version>23377</build-version>
    </diagnostics> 
    <results/>
</query>

facebook.com 的 robots.txt 看起来有一些问题。我应该提到的是，上述 YQL 语句适用于其他网站（例如 https://www.youtube.com 或 https://www.yahoo.com ）。

提前致谢

原文

I'm working on a website which should be fully visible for users in a place that Facebook.com is banned for them. so my Facebook like box plugin will not appear for them. ( in order to not localize this question, assume that I want to bypass all client-side firewalls and show like box plugin as a simple HTML in my website (my website is not banned there)).

My server can access Facebook.com and I thought that i can get content of my plugin using curl (in my server's computer) and then execute and show content of that page as a simple HTML in any part of my website. So I just wrote following script:

<?
$c = curl_init('https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false');

curl_setopt($ch, CURLOPT_HTTPHEADER, array('Host: www.facebook.com', 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
', 'Accept-Language: en-us,en;q=0.5', 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7', 'Accept-Encoding: gzip, deflate'));
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0");

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt(CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);



$html = curl_exec($c);

if (curl_error($c))
    die(curl_error($c));

$status = curl_getinfo($c, CURLINFO_HTTP_CODE);

curl_close($c);
?>

Surprisingly above code works for https://www.youtube.com( which is banned there too ) or https://www.google.com but not working with that URL or even simply https://www.facebook.com in my server.

Another question : if I use https://www.youtube.com instead of Facebook.com I still can't get CSS files or Javascript files which used in YouTube.com( because they are banned too and clients can't download it either). I just can see texts and some images. I also want curl to automatically get content of CSS and Javascript files.

I also used YQL in order to get content of like box plugin from Facebook.com but I got following result:

YQL statement:

select * from html where url = 'https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false'

Result:

    <?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
    yahoo:count="0" yahoo:created="2011-11-11T11:41:10Z" yahoo:lang="en-US">
    <diagnostics>
        <publiclyCallable>true</publiclyCallable>
        <url
            error="Redirected to a robots.txt restricted URL: https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&amp;width=292&amp;height=258&amp;colorscheme=dark&amp;show_faces=true&amp;border_color&amp;stream=false&amp;header=false"
            execution-start-time="1" execution-stop-time="6"
            execution-time="5" http-status-code="403"
            http-status-message="Forbidden" proxy="DEFAULT"><![CDATA[https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false]]></url>
        <user-time>6</user-time>
        <service-time>5</service-time>
        <build-version>23377</build-version>
    </diagnostics> 
    <results/>
</query>

It looks like there are some problem with facebook.com's robots.txt. I should mention that above YQL statement works for other websites( like https://www.youtube.com or https://www.yahoo.com ).

Thanks in advance

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

请别遗忘我 2024-12-21 15:17:15

您的代码中存在错误：

1- 将代码所有部分中的 $c 更改为 $ch 。

2- 在curl_exec函数后添加“echo $html”。

3- 正如@Dan 在评论中提到的，CURLOPT_HTTPHEADER 不是必需的。只需将其删除即可。

4- 设置curlopt_cookiejar不是必需的，但我总是用curl设置它。（只是为了确保一切正常）

5-删除之前的所有内容，以便正确显示内容。

尝试以下代码：

$ch = curl_init('https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false');

curl_setopt($ch, CURLOPT_USERAGENT , 'Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt($ch, CURLOPT_COOKIEJAR , "facebookcookies"); 
curl_setopt($ch, CURLOPT_URL,"https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false"); 
curl_setopt($ch, CURLOPT_HEADER, 1); 
curl_setopt($ch, CURLOPT_POST, true );
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

$html = curl_exec($ch);

//remove everything before <!DOCTYPE
echo preg_replace('/^[^<!]*<!\s*/', '<!', $html);

if (curl_error($ch))
    die(curl_error($ch));

// Get the status code
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);

curl_close($ch);

there are mistakes in your code:

1- change $c to $ch in all parts of your code.

2- add "echo $html" after curl_exec function.

3- as @Dan mentioned in a comment, CURLOPT_HTTPHEADER isn't necessary. simply remove it.

4- setting curlopt_cookiejar isn't necessary but I always set it with curl. ( just to make sure that everything works fine )

5- remove everything before <!DOCTYPE in order to show content properly.

try following code:

$ch = curl_init('https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false');

curl_setopt($ch, CURLOPT_USERAGENT , 'Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt($ch, CURLOPT_COOKIEJAR , "facebookcookies"); 
curl_setopt($ch, CURLOPT_URL,"https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false"); 
curl_setopt($ch, CURLOPT_HEADER, 1); 
curl_setopt($ch, CURLOPT_POST, true );
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

$html = curl_exec($ch);

//remove everything before <!DOCTYPE
echo preg_replace('/^[^<!]*<!\s*/', '<!', $html);

if (curl_error($ch))
    die(curl_error($ch));

// Get the status code
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);

curl_close($ch);

回复收藏 0 原文

~没有更多了~