从 NHL.com 抓取数据
我正在尝试从 http://www.nhl.com/ice/stands.htm?season=20112012&type=LEA" rel="nofollow">http://www. nhl.com/ice/stands.htm?season=20112012&type=LEA,并将其存储到我服务器上的 MySQL 数据库中。使用下面的内容,我可以准确复制该网站,但我不确定如何提取该表。代码如下:
有什么想法吗?
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');
echo $returned_content;
更新:
$html = file_get_html('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');
$e = $html->find("table", 2);
echo($e);
这段代码有效,发布了一张像我需要的那样的表格。但现在我很好奇如何删除所有不必要的链接/格式并将其保存到数据库中?
I'm trying to grab the table from http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA, and store it into a MySQL database on my server. Using the below, I'm able to copy the website exactly, but I'm not sure how to just extract that table. Code below:
Any ideas?
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');
echo $returned_content;
UPDATE:
$html = file_get_html('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');
$e = $html->find("table", 2);
echo($e);
This code works, posts a table like the one I need. But Now I'm curious as to how I would go about stripping all unnecessary links/formatting and saving it to the database?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试使用这个库
http://simplehtmldom.sourceforge.net/
Try working with this library
http://simplehtmldom.sourceforge.net/
在 Google 电子表格中,我对 nhl.com 的所有
importhtml
功能今年都不起作用。我认为他们(出于某种原因)阻止用户抓取他们的数据。In google spreadsheets, all my
importhtml
functions to nhl.com do not work this year. I think they are (for some reason) blocking users from scraping their data.