Craigslist、CURL、简单的 PHP DOM 问题

发布于 2024-09-13 19:52:25 字数 1247 浏览 8 评论 0原文

我正在使用 CURL 登录 Craigslist 来抓取我发布的列表的状态。我遇到的问题是将 HTML 从 CURL $output 传输到 file_get_html。虽然 Craigslist 状态实际上嵌套在 TR 元素内,但我只是想测试最基本的功能,看看事情是否通过(即链接抓取)。他们不是。

例如,这不起作用:

$cookie_file_path = getcwd()."/cookie.txt";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://accounts.craigslist.org/login?LoginType=L&step=confirmation&originalURI=%2Flogin&rt=&rp=&inputEmailHandle='.$email.'&inputPassword='.$password.'&submit=Log%20In');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.craigslist.org');

$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);

$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
echo $output;

//

include_once('simple_html_dom.php');
$html = file_get_html($output);
//find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 

我知道表达式有效,因为如果我输入 'http://google.com 它会返回链接',或者其他什么。

I am logging into Craigslist with CURL to scrape the status of my posted listings. The problem I encounter is the transfer of HTML from CURL $output to file_get_html. While Craigslist statuses are actually nested inside TR elements, I just wanted to test the most basic functions to see if things were getting passed through (i.e. link scraping). They are not.

For example, this doesn't work:

$cookie_file_path = getcwd()."/cookie.txt";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://accounts.craigslist.org/login?LoginType=L&step=confirmation&originalURI=%2Flogin&rt=&rp=&inputEmailHandle='.$email.'&inputPassword='.$password.'&submit=Log%20In');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.craigslist.org');

$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);

$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
echo $output;

//

include_once('simple_html_dom.php');
$html = file_get_html($output);
//find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 

I know the expression works because it returns links if I put in 'http://google.com', or something or other.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

虫児飞 2024-09-20 19:52:26

您不应该使用 str_get_html 而不是 file_get_html 吗?
因为 $ouput 是一个字符串!

Shouldn't you be using str_get_html instead of file_get_html?
Since $ouput is a string!

江湖正好 2024-09-20 19:52:25

应该这样做

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_URL, 'http://www.sitename.com');  
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);  
$str = curl_exec($curl);  
curl_close($curl);  

$html= str_get_html($str); 

This is how it should be done

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_URL, 'http://www.sitename.com');  
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);  
$str = curl_exec($curl);  
curl_close($curl);  

$html= str_get_html($str); 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文