将 cURL 与 C 和非顶级 URL 结合使用时出现问题
我正在使用 cURL 来抓取网页,但我似乎只能抓取顶级 URL。例如,如果我想 cURL URL“http://www.businessweek.com/news/2010-09-29/flaherty-says-canada-july-gdp-report-tomorrow-may-be- negative.html”,那么它什么也不返回(就好像它是一个空白页)。
这是我的 C 代码:
#include <stdio.h>
#include <curl/curl.h>
int main(void)
{
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl) {
//THIS WORKS
//curl_easy_setopt(curl, CURLOPT_URL, "news.google.com");
//THIS DOESN'T WORK
curl_easy_setopt(curl, CURLOPT_URL, "http://www.businessweek.com/news/2010-09-29/flaherty-says-canada-july-gdp-report-tomorrow-may-be-negative.html");
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
}
return 0;
}
如果我能就这个问题获得一些意见,那就太好了。
I'm using cURL to scrape web pages but I can only seem to scrape top-level URLs. For example, if I want to cURL the URL "http://www.businessweek.com/news/2010-09-29/flaherty-says-canada-july-gdp-report-tomorrow-may-be-negative.html" then it returns nothing (as if it's a blank page).
This is my C code:
#include <stdio.h>
#include <curl/curl.h>
int main(void)
{
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl) {
//THIS WORKS
//curl_easy_setopt(curl, CURLOPT_URL, "news.google.com");
//THIS DOESN'T WORK
curl_easy_setopt(curl, CURLOPT_URL, "http://www.businessweek.com/news/2010-09-29/flaherty-says-canada-july-gdp-report-tomorrow-may-be-negative.html");
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
}
return 0;
}
If I could get some input on this issue that would be great.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是因为该网站正在发送 301。将
CURLOPT_FOLLOWLOCATION
设置为 1 以自动跟踪它们。It's because the site is sending a 301. Set
CURLOPT_FOLLOWLOCATION
to 1 to follow them automatically.