使用curl获取类似盒子插件的facebook内容
我正在开发一个网站,该网站应该对 Facebook.com 被禁止的地方的用户完全可见。所以我的 Facebook like box 插件不会为他们出现。 (为了不本地化这个问题,假设我想绕过所有客户端防火墙并在我的网站中将 box 插件显示为简单的 HTML(我的网站在那里没有被禁止))。
我的服务器可以访问 Facebook.com,我认为我可以使用curl(在我的服务器计算机中)获取插件的内容,然后在我网站的任何部分执行并显示该页面的内容作为简单的 HTML。所以我只是写了以下脚本:
<?
$c = curl_init('https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false');
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Host: www.facebook.com', 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
', 'Accept-Language: en-us,en;q=0.5', 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7', 'Accept-Encoding: gzip, deflate'));
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt(CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$html = curl_exec($c);
if (curl_error($c))
die(curl_error($c));
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);
curl_close($c);
?>
令人惊讶的是,上面的代码适用于 https://www.youtube.com( 这是被禁止的也有 ) 或 https://www.google.com 但无法使用该网址,甚至只是 https://www.facebook.com 在我的服务器中。
另一个问题:如果我使用 https://www.youtube.com 而不是 Facebook.com 我仍然不能获取 YouTube.com 中使用的 CSS 文件或 Javascript 文件(因为它们也被禁止,客户端也无法下载)。我只能看到文字和一些图像。我还希望curl能够自动获取CSS和Javascript文件的内容。
我还使用 YQL 来从 Facebook.com 获取 like box 插件的内容,但得到以下结果:
YQL 语句:
select * from html where url = 'https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false'
结果:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
yahoo:count="0" yahoo:created="2011-11-11T11:41:10Z" yahoo:lang="en-US">
<diagnostics>
<publiclyCallable>true</publiclyCallable>
<url
error="Redirected to a robots.txt restricted URL: https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&amp;width=292&amp;height=258&amp;colorscheme=dark&amp;show_faces=true&amp;border_color&amp;stream=false&amp;header=false"
execution-start-time="1" execution-stop-time="6"
execution-time="5" http-status-code="403"
http-status-message="Forbidden" proxy="DEFAULT"><![CDATA[https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false]]></url>
<user-time>6</user-time>
<service-time>5</service-time>
<build-version>23377</build-version>
</diagnostics>
<results/>
</query>
facebook.com 的 robots.txt 看起来有一些问题。我应该提到的是,上述 YQL 语句适用于其他网站(例如 https://www.youtube.com 或 https://www.yahoo.com )。
提前致谢
I'm working on a website which should be fully visible for users in a place that Facebook.com is banned for them. so my Facebook like box plugin will not appear for them. ( in order to not localize this question, assume that I want to bypass all client-side firewalls and show like box plugin as a simple HTML in my website (my website is not banned there)).
My server can access Facebook.com and I thought that i can get content of my plugin using curl (in my server's computer) and then execute and show content of that page as a simple HTML in any part of my website. So I just wrote following script:
<?
$c = curl_init('https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false');
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Host: www.facebook.com', 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
', 'Accept-Language: en-us,en;q=0.5', 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7', 'Accept-Encoding: gzip, deflate'));
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt(CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$html = curl_exec($c);
if (curl_error($c))
die(curl_error($c));
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);
curl_close($c);
?>
Surprisingly above code works for https://www.youtube.com( which is banned there too ) or https://www.google.com but not working with that URL or even simply https://www.facebook.com in my server.
Another question : if I use https://www.youtube.com instead of Facebook.com I still can't get CSS files or Javascript files which used in YouTube.com( because they are banned too and clients can't download it either). I just can see texts and some images. I also want curl to automatically get content of CSS and Javascript files.
I also used YQL in order to get content of like box plugin from Facebook.com but I got following result:
YQL statement:
select * from html where url = 'https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false'
Result:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
yahoo:count="0" yahoo:created="2011-11-11T11:41:10Z" yahoo:lang="en-US">
<diagnostics>
<publiclyCallable>true</publiclyCallable>
<url
error="Redirected to a robots.txt restricted URL: https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false"
execution-start-time="1" execution-stop-time="6"
execution-time="5" http-status-code="403"
http-status-message="Forbidden" proxy="DEFAULT"><![CDATA[https://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fstevejobs&width=292&height=258&colorscheme=dark&show_faces=true&border_color&stream=false&header=false]]></url>
<user-time>6</user-time>
<service-time>5</service-time>
<build-version>23377</build-version>
</diagnostics>
<results/>
</query>
It looks like there are some problem with facebook.com's robots.txt. I should mention that above YQL statement works for other websites( like https://www.youtube.com or https://www.yahoo.com ).
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的代码中存在错误:
1- 将代码所有部分中的 $c 更改为 $ch 。
2- 在curl_exec函数后添加“echo $html”。
3- 正如@Dan 在评论中提到的,CURLOPT_HTTPHEADER 不是必需的。只需将其删除即可。
4- 设置curlopt_cookiejar不是必需的,但我总是用curl设置它。 (只是为了确保一切正常)
5-删除
之前的所有内容,以便正确显示内容。
尝试以下代码:
there are mistakes in your code:
1- change $c to $ch in all parts of your code.
2- add "echo $html" after curl_exec function.
3- as @Dan mentioned in a comment, CURLOPT_HTTPHEADER isn't necessary. simply remove it.
4- setting curlopt_cookiejar isn't necessary but I always set it with curl. ( just to make sure that everything works fine )
5- remove everything before
<!DOCTYPE
in order to show content properly.try following code: