尝试抓取 div 的整个内容

发布于 2024-09-18 20:41:57 字数 1621 浏览 2 评论 0原文

我正在开发这个项目,我想使用来自 touch.facebook.com 的 iframe 中的 facebook 地点添加一个非常小的附近地点列表,我可以轻松地使用 touch.facebook.com/#/places_friends.php但随后会加载标题和其他导航栏(例如消息、事件等栏),而我只想要内容。

我很确定通过查看 touch.facebook.com/#/places_friends.php 源代码,我需要加载的只是 div“内容”无论如何,我对 php 非常陌生,我很确定我的想法我正在尝试做的事情称为网络抓取。

为了弄清楚 stackoverflow 上的事情,并且不需要担心身份验证或其他任何事情,但我想加载登录页面,看看我是否至少可以让 scraper 工作。一旦我有了一个有效的抓取代码,我就很确定我可以处理剩下的事情。它已经加载了 div 内的所有内容。我以前见过这样做,所以我知道这是可能的。它看起来和你尝试登录 touch.facebook.com 时看到的一模一样,但顶部没有蓝色的 facebook 徽标,这就是我在这里想要完成的任务。

这是登录页面,我尝试加载包含文本框的 div 以登录实际的登录按钮。如果操作正确,我们应该只会看到上面没有模糊 Facebook 标题栏的内容。

我尝试

<?php
$page = file_get_contents('http://touch.facebook.com/login.php');
$doc = new DOMDocument();
$doc->loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
      if ($div->getAttribute('id') === 'login_form') {
         echo $div->nodeValue;
    }
}
?>

过的只是加载一个空白页。

我还尝试使用 http://simplehtmldom.sourceforge.net/

并修改了示例基本选择器我

<?php
include('../simple_html_dom.php');

$html = file_get_html('http://touch.facebook.com/login.php');

foreach($html->find('div#login_form') as $e)
    echo $e->nodeValue;

?>

也尝试过

<?php
$stream = "http://touch.facebook.com/login.php";
$cnt = simplexml_load_file($stream);

$result = $cnt->xpath("/html/body/div[@id=login_form]");

for($i = 0; $i < $i < count($result); $i++){
    echo $result[$i];
}
?>

,但也不起作用

I have this project i'm working on and id like to add a really small list of nearby places using facebooks places in an iframe featured from touch.facebook.com I can easily just use touch.facebook.com/#/places_friends.php but then that loads the headers the and the other navigation bars for like messges, events ect bars and i just want the content.

I'm pretty sure from looking at the touch.facebook.com/#/places_friends.php source, all i need to load is the div "content" Anyway, i'm extremely new to php and im pretty sure what i think i'm trying to do is called web scraping.

For the sake of figuring things out on stackoverflow and not needing to worry about authentication or anything yet i want to load the login page to see if i can at least get the scraper to work. Once I have a working scraping code i'm pretty sure i can handle the rest. It has load everything inside the div. I've seen this done before so i know it is possible. and it will look exactly like what you see when you try to login at touch.facebook.com but without the blue facebook logo up top and thats what im trying to accomplish right here.

So here's the login page, im trying to load the div which contains the text boxes to login the actual login button. If it's done correctly we should just see those with no blur Facebook header bar above it.

I've tried

<?php
$page = file_get_contents('http://touch.facebook.com/login.php');
$doc = new DOMDocument();
$doc->loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
      if ($div->getAttribute('id') === 'login_form') {
         echo $div->nodeValue;
    }
}
?>

all that does is load a blank page.

I've also tried using http://simplehtmldom.sourceforge.net/

and i modified the example basic selector to

<?php
include('../simple_html_dom.php');

$html = file_get_html('http://touch.facebook.com/login.php');

foreach($html->find('div#login_form') as $e)
    echo $e->nodeValue;

?>

I've also tried

<?php
$stream = "http://touch.facebook.com/login.php";
$cnt = simplexml_load_file($stream);

$result = $cnt->xpath("/html/body/div[@id=login_form]");

for($i = 0; $i < $i < count($result); $i++){
    echo $result[$i];
}
?>

that did not work either

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

蓝咒 2024-09-25 20:42:11

抓取并不总是在其他地方捕获数据的最佳方法。我建议使用 Facebook 的 API 来检索您需要的值。只要 Facebook 决定更改其标记,抓取就会中断。

http://developers.facebook.com/docs/api

http://github.com/facebook/php-sdk/

Scraping isn't always the best idea for capturing data else where. I would suggest using Facebook's API to retrieve the values your needing. Scraping will break any time Facebook decides to change their markup.

http://developers.facebook.com/docs/api

http://github.com/facebook/php-sdk/

烟织青萝梦 2024-09-25 20:42:09

您需要了解比较运算符

=== 用于严格比较,您应该使用 ==

if ($div->getAttribute('id') == 'login_form')
{

}

You need to learn about your comparison operators

=== is for comparing strictly, you should be using ==

if ($div->getAttribute('id') == 'login_form')
{

}
我三岁 2024-09-25 20:42:07

我假设您无法使用 facebook API,如果可以,那么我强烈建议您使用它,因为您可以将自己从整个抓取交易中拯救出来。

抓取文本,最好的技术是使用xpath,如果touch.facebook.com返回的html是xhtml过渡,它应该,你应该使用xpath,示例应该如下所示:

$stream = "http://touch.facebook.com";
$cnt = simplexml_load_file($stream);

$result = $nct->xpath("/html/body/div[@id=content]");

for ($i = 0; $i < $i < count($result); $i++){
    echo $result[$i];
}

Im assuming that you can't use the facebook API, if you can, then I strongly suggest you use it, because you will save yourself from the whole scraping deal.

To scrape text the best tech is using xpath, if the html returned by touch.facebook.com is xhtml transitional, which it sould, the you should use xpath, a sample should look like this:

$stream = "http://touch.facebook.com";
$cnt = simplexml_load_file($stream);

$result = $nct->xpath("/html/body/div[@id=content]");

for ($i = 0; $i < $i < count($result); $i++){
    echo $result[$i];
}
你是年少的欢喜 2024-09-25 20:42:04
$stream = "http://touch.facebook.com";
$cnt = simplexml_load_file($stream);

$result = $nct->xpath("/html/body/div[@id=content]");

for ($i = 0; $i < count($result); $i++){
    echo $result[$i];
}

此行中有一个语法错误,我将其删除,现在只需复制并粘贴并运行此代码

$stream = "http://touch.facebook.com";
$cnt = simplexml_load_file($stream);

$result = $nct->xpath("/html/body/div[@id=content]");

for ($i = 0; $i < count($result); $i++){
    echo $result[$i];
}

there was a syntax error in this line i removed it now just copy and paste and run this code

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文