PHP 简单 HTML DOM 解析器:使其循环直到没有错误

发布于 2025-01-04 22:52:46 字数 1215 浏览 0 评论 0原文

我有一个名为 GrabUrTime 的应用程序,它是一个时间表查看实用程序,可以从另一个网站(我大学的网络空间)获取时间表。每隔凌晨 2 点,我都会运行一个脚本,使用解析器抓取所有时间表并将其转储到我的数据库中。

但是今天uni的服务器运行得不好,我的脚本在uni的服务器上不断给我错误500,使得脚本无法继续运行。这是周期性的,但并非总是如此。然而我尝试了几次,它只是随机发生,根本没有模式。

因此,我想让我的脚本来处理错误并使其循环直到获取数据。

function grabtable($intakecode, $week) {
$html = file_get_html("http://webspace.apiit.edu.my/schedule/intakeview_intake.jsp?Intake1=".$intakecode."&Week=" . $week);
$dumb = $html->find('table[border=1] tr');
$thatarray = array();
        for ($i=1; $i < sizeof($dumb);++$i){
        $arow = $html->find('table[border=1] tr', $i);
         $date = $arow->find('td font', 0)->innertext;
         $time = $arow->find('td font', 1)->innertext;
        $room = $arow->find('td font', 2)->innertext;
        $loca = $arow->find('td font', 3)->innertext;
         $modu = $arow->find('td font', 4)->innertext;
         $lect = $arow->find('td font', 5)->innertext;
        $anarray = array($date, $time, $room, $loca, $modu, $lect);
        $thatarray[$i] = $anarray;

        //echo "arraylol";
    }
    //echo serialize($tablearray)."<br/>";
    $html->clear();
    return $thatarray;
}

I had an app called GrabUrTime, it's a timetable viewing utility that get its timetables from another site, my university's webspace. Every 2am I run a script that scrapes all the timetables using the parser and dump it into my database.

But today the uni's server isn't running well and my script keeps on giving me error 500 on uni's server, making the script cannot continue to run. It's periodic, not always. However I tried a few times and it just occurs randomly, no pattern at all.

Hence I want to make my script to handle the error and make it loop until it gets the data.

function grabtable($intakecode, $week) {
$html = file_get_html("http://webspace.apiit.edu.my/schedule/intakeview_intake.jsp?Intake1=".$intakecode."&Week=" . $week);
$dumb = $html->find('table[border=1] tr');
$thatarray = array();
        for ($i=1; $i < sizeof($dumb);++$i){
        $arow = $html->find('table[border=1] tr', $i);
         $date = $arow->find('td font', 0)->innertext;
         $time = $arow->find('td font', 1)->innertext;
        $room = $arow->find('td font', 2)->innertext;
        $loca = $arow->find('td font', 3)->innertext;
         $modu = $arow->find('td font', 4)->innertext;
         $lect = $arow->find('td font', 5)->innertext;
        $anarray = array($date, $time, $room, $loca, $modu, $lect);
        $thatarray[$i] = $anarray;

        //echo "arraylol";
    }
    //echo serialize($tablearray)."<br/>";
    $html->clear();
    return $thatarray;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

ㄟ。诗瑗 2025-01-11 22:52:47

尝试这样的事情:

function getHttpCode($url)
{
    $agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
    $ch=curl_init();
    curl_setopt ($ch, CURLOPT_URL,$url );
    curl_setopt($ch, CURLOPT_USERAGENT, $agent);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($ch,CURLOPT_VERBOSE,false);
    curl_setopt($ch, CURLOPT_TIMEOUT, 5);
    $page=curl_exec($ch);

    //echo curl_error($ch);
    $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if($httpcode>=200 && $httpcode<300) 
    {
        // YOUR CODE
    }
else
{
    // What you want to do should it fail
    // perhaps this will serve you better as while loop, e.g.
    // while($httpcode>=200 && $httpcode<300) { ... }
}

用法

 getHttpCode($url);

它可能无法整齐地插入您的代码,但我确信它可以帮助进行一些重构以适应您现有的代码结构。

try something like this:

function getHttpCode($url)
{
    $agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
    $ch=curl_init();
    curl_setopt ($ch, CURLOPT_URL,$url );
    curl_setopt($ch, CURLOPT_USERAGENT, $agent);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($ch,CURLOPT_VERBOSE,false);
    curl_setopt($ch, CURLOPT_TIMEOUT, 5);
    $page=curl_exec($ch);

    //echo curl_error($ch);
    $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if($httpcode>=200 && $httpcode<300) 
    {
        // YOUR CODE
    }
else
{
    // What you want to do should it fail
    // perhaps this will serve you better as while loop, e.g.
    // while($httpcode>=200 && $httpcode<300) { ... }
}

usage

 getHttpCode($url);

It might not plug neatly into your code as it is but I'm sure it can help with a little re-factoring to suit your existing code structure.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文