HTML 解析 - 从 div 内的表格获取数据?

发布于 2024-11-25 20:43:30 字数 874 浏览 0 评论 0原文

我对 HTML 解析/抓取的整个想法还比较陌生。我希望我能来这里获得我需要的帮助!

基本上我想要做的(我认为)是指定我希望从中获取数据的页面的 url。在这种情况下 - http://www.epgpweb.com/guild/us/Caelestraz/ Crimson/

从那里,我想获取 div id=snapshot_table 中的表 class=listing。

然后,我希望将该表嵌入到我自己的页面上,并在原始内容更新时更新它。

我读过 Google 和 Stackoverflow 上的一些其他帖子,还看过 Nettuts+ 上的教程,但似乎有点太多了,无法立即理解。

希望这里有人可以帮助我并使其尽可能简单:)

干杯,

Mat

--编辑--

截至上午 11:22 (GMT+10) 的当前代码

<?php
    # don't forget the library
    include('simple_html_dom.php');
?>
<html>
</head>
<body>
<?php
    $html = file_get_html('http://www.epgpweb.com/guild/us/Caelestrasz/Crimson/');
    $table = $html->find('#snapshot_table table.listing');
    print_r($table);
?>
</body>
</html>

I am relatively new to the whole idea for HTML parsing/scraping. I was hoping that I could come here to get the help that I need!

Basically what I am looking to do (i think), is specify the url of the page I wish to grab the data from. In this case - http://www.epgpweb.com/guild/us/Caelestrasz/Crimson/

From there, I want to grab the table class=listing in the div id=snapshot_table.

I then wish to embed that table onto my own page and have it update when the original content is updated.

I have read a few of the other posts on Google and Stackoverflow, I also had a look at a tutorial on Nettuts+ but it just seemed to be a bit too much to take in at once.

Hopefully someone here can help me out and make this as simple as possible :)

Cheers,

Mat

--Edit--

Current code as of 11:22am (GMT+10)

<?php
    # don't forget the library
    include('simple_html_dom.php');
?>
<html>
</head>
<body>
<?php
    $html = file_get_html('http://www.epgpweb.com/guild/us/Caelestrasz/Crimson/');
    $table = $html->find('#snapshot_table table.listing');
    print_r($table);
?>
</body>
</html>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

橘虞初梦 2024-12-02 20:43:30

我想我已经开始工作了,而且我学到了很多东西! :)

<?php
//Get the current timestamp
$url = 'http://www.epgpweb.com/api/snapshot/us/Caelestrasz/Crimson';
$url = file_get_contents($url);
$url = substr($url,-12,10); 

//Get the member data based on the timestamp
$url = 'http://www.epgpweb.com/api/snapshot/us/Caelestrasz/Crimson/'.$url;
$url = file_get_contents($url);

//Convert the unicode to html entities, as I found here: http://stackoverflow.com/questions/2934563/how-to-decode-unicode-escape-sequences-like-u00ed-to-proper-utf-8-encoded-char
function replace_unicode_escape_sequence($match) {
    return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}
$url = preg_replace_callback('/\\\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', $url);

//erase/replace the insignificant parts, to put the data into an array
function erase($a){
    global $url;
    $url = explode($a,$url);
    $url = implode("",$url);
}
function replace($a,$b){
    global $url;
    $url = explode($a,$url);
    $url = implode($b,$url);    
}
replace("[[",";");
replace("]]",";");
replace("],",";");
erase('[');
erase('"');
replace(":",",");
$url = explode(";", $url);

//lose the front and end bits, and maintain the member data
array_shift($url);
array_pop($url);

//put the data into an array
foreach($url as $k=>$v){
    $v = explode(",",$v);
    foreach($v as $k2=>$v2){
        $data[$k][$k2] = $v2;
    }
    $pr = round(intval($data[$k][1]) / intval($data[$k][2]),3);
    $pr = str_pad($pr,5,"0",STR_PAD_RIGHT);
    $pr = substr($pr, 0, 5);
    $data[$k][3] = $pr;
}

//sort the array by PR number
function compare($x, $y)
{
if ( $x[3] == $y[3] )
 return 0;
else if ( $x[3] > $y[3] )
 return -1;
else
 return 1;
}
usort($data, 'compare');

//output the data into a table
echo "<table><tbody><tr><th>Member</th><th>EP</th><th>GP</th><th>PR</th></tr>";
foreach($data as $k=>$v){
    echo "<tr>";
    foreach($v as $v2){ 
        echo "<td>".$v2."</td>";
    }
    echo "</tr>";
}
echo "</tbody></table>";
?>

I think I got it to work, and I learned a lot! :)

<?php
//Get the current timestamp
$url = 'http://www.epgpweb.com/api/snapshot/us/Caelestrasz/Crimson';
$url = file_get_contents($url);
$url = substr($url,-12,10); 

//Get the member data based on the timestamp
$url = 'http://www.epgpweb.com/api/snapshot/us/Caelestrasz/Crimson/'.$url;
$url = file_get_contents($url);

//Convert the unicode to html entities, as I found here: http://stackoverflow.com/questions/2934563/how-to-decode-unicode-escape-sequences-like-u00ed-to-proper-utf-8-encoded-char
function replace_unicode_escape_sequence($match) {
    return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}
$url = preg_replace_callback('/\\\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', $url);

//erase/replace the insignificant parts, to put the data into an array
function erase($a){
    global $url;
    $url = explode($a,$url);
    $url = implode("",$url);
}
function replace($a,$b){
    global $url;
    $url = explode($a,$url);
    $url = implode($b,$url);    
}
replace("[[",";");
replace("]]",";");
replace("],",";");
erase('[');
erase('"');
replace(":",",");
$url = explode(";", $url);

//lose the front and end bits, and maintain the member data
array_shift($url);
array_pop($url);

//put the data into an array
foreach($url as $k=>$v){
    $v = explode(",",$v);
    foreach($v as $k2=>$v2){
        $data[$k][$k2] = $v2;
    }
    $pr = round(intval($data[$k][1]) / intval($data[$k][2]),3);
    $pr = str_pad($pr,5,"0",STR_PAD_RIGHT);
    $pr = substr($pr, 0, 5);
    $data[$k][3] = $pr;
}

//sort the array by PR number
function compare($x, $y)
{
if ( $x[3] == $y[3] )
 return 0;
else if ( $x[3] > $y[3] )
 return -1;
else
 return 1;
}
usort($data, 'compare');

//output the data into a table
echo "<table><tbody><tr><th>Member</th><th>EP</th><th>GP</th><th>PR</th></tr>";
foreach($data as $k=>$v){
    echo "<tr>";
    foreach($v as $v2){ 
        echo "<td>".$v2."</td>";
    }
    echo "</tr>";
}
echo "</tbody></table>";
?>
千柳 2024-12-02 20:43:30

看一下 PHP simple_html_dom 类

接下来就可以解决这个问题了。

$html = file_get_html('http://www.epgpweb.com/guild/us/Caelestrasz/Crimson/');
$table = $html->find('#snapshot_table table.listing');

Take a look at the PHP simple_html_dom class.

Next this will do the trick.

$html = file_get_html('http://www.epgpweb.com/guild/us/Caelestrasz/Crimson/');
$table = $html->find('#snapshot_table table.listing');
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文