PHP DOM 解析器从 Span 获取数据

发布于 2024-12-18 10:23:24 字数 2331 浏览 6 评论 0原文

我正在尝试使用 DOM 从以下一批 HTML 中获取日期和时间以及房间（我实际上获取了我的脚本中的所有内容，但它获取了这两个我遇到了麻烦）：

                    </td><td class="call">
                    <span>12549<br/></span><a href="http://www.bkstr.com/webapp/wcs/stores/servlet/booklookServlet?bookstore_id-1=584&term_id-1=201190&crn-1=12549" target="_blank">View Book Info</a>
                    </td><td>
                    <span id="ctl10_gv_sectionTable_ctl03_lblDays">F:1000AM - 1125AM<br />T:230PM - 355PM</span>


                    </td><td class="room">
                    <span id="ctl10_gv_sectionTable_ctl03_lblRoom">KUPF106<br />KUPF106</span>
                    </td><td class="status"><span id="ctl10_gv_sectionTable_ctl03_lblStatus" class="red">Closed</span></td><td class="max">20</td><td class="now">49</td><td class="instructor">
                    <a href="https://directory.njit.edu/PersDetails.aspx?persid=SCHOENKA" target="_blank">Schoenebeck Kar</a>
                    </td><td class="credits">3.00</td>

        </tr><tr class="sectionRow">
            <td class="section">
                    101<br />

这是我迄今为止找到的内容正如您从 HTML 中看到的那样

    $tracker =0;
    // DAYS AND TIMES
    $number = 3;
    $digit = "0";
    while($tracker<$numSections){           
        $strNum = strval($number);
        $zero = strval($digit);
        $start = "ctl10_gv_sectionTable_ctl";
        $end = "_lblDays";
        $id = $start.$zero.$strNum.$end;
        //$days = $html->find('span.$id');
        $days=$html->getElementByTagName('span')->getElementById($id);
            echo "Days : ";
            echo $days[0] . '<br>';


        $tracker++;
        $number++;
        if($number >9){
            $digit = "1";
            $number=0;
        }
    }

，正在解析的网站的某些跨度具有非常独特的 ID (ctl10_gv_sectionTable_ctl03_lblRoom)。由于我只发布了 1 个部分的 HTML 块，所以您看不到的是，下一个类部分的代码是相同的，除了“ctl03”部分，这是我所处理的所有额外代码，所以没有一个人被它甩掉了。

我尝试了几种不同的方法，但似乎无法获取日期（即“1000AM - 1125AM”）或房间（即 KUPF106）。其余的东西很容易获取，但是这两个没有类标识符，甚至没有 td 标识符。我想我只需要知道如何使用 $id 中的值作为我正在寻找的特定范围 id ？如果是这样，有人可以告诉我该怎么做吗？

原文

I am trying to use DOM to get the days and times and also the rooms (im actually getting everything in my script but its getting these two im having trouble with) from the following batch of HTML:

                    </td><td class="call">
                    <span>12549<br/></span><a href="http://www.bkstr.com/webapp/wcs/stores/servlet/booklookServlet?bookstore_id-1=584&term_id-1=201190&crn-1=12549" target="_blank">View Book Info</a>
                    </td><td>
                    <span id="ctl10_gv_sectionTable_ctl03_lblDays">F:1000AM - 1125AM<br />T:230PM - 355PM</span>


                    </td><td class="room">
                    <span id="ctl10_gv_sectionTable_ctl03_lblRoom">KUPF106<br />KUPF106</span>
                    </td><td class="status"><span id="ctl10_gv_sectionTable_ctl03_lblStatus" class="red">Closed</span></td><td class="max">20</td><td class="now">49</td><td class="instructor">
                    <a href="https://directory.njit.edu/PersDetails.aspx?persid=SCHOENKA" target="_blank">Schoenebeck Kar</a>
                    </td><td class="credits">3.00</td>

        </tr><tr class="sectionRow">
            <td class="section">
                    101<br />

Here is what I have so far for finding days

    $tracker =0;
    // DAYS AND TIMES
    $number = 3;
    $digit = "0";
    while($tracker<$numSections){           
        $strNum = strval($number);
        $zero = strval($digit);
        $start = "ctl10_gv_sectionTable_ctl";
        $end = "_lblDays";
        $id = $start.$zero.$strNum.$end;
        //$days = $html->find('span.$id');
        $days=$html->getElementByTagName('span')->getElementById($id);
            echo "Days : ";
            echo $days[0] . '<br>';


        $tracker++;
        $number++;
        if($number >9){
            $digit = "1";
            $number=0;
        }
    }

as you can see from the HTML, the site im parsing has pretty unique ID's for some of its spans (ctl10_gv_sectionTable_ctl03_lblRoom). As I only posted 1 section's HTML block, what you don't see is that the code for the next class section is identical except for the "ctl03" part, which is what all the extra code I have takes care of, just so no one is thrown off by it.

I've tried a few different ways but can not seem to get the days (i.e. "1000AM - 1125AM") or the rooms (i.e. KUPF106). The rest of the stuff is pretty simple to grab but these two don't have class identifiers or even a td identifier. I think I just need to know how to use the value I have in $id as the specific span id I am looking for? If so can someone show me how to do that?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伪装你 2024-12-25 10:23:24

这：

$html->getElementByTagName('span')->getElementById($id);

毫无意义。 getElementByTagName 返回一个 DOMList，它没有 getElementById 方法。

我认为您的意思是 $html->getElementById($id);，但我无法确定，因为我不知道 $html 是什么。

获得元素后，如果不需要在文本节点之间行走，则可以使用 $element->textContent 获取文本值。

您是否考虑过使用 DOMXPath 来执行解析任务？它可能更更容易和更清晰。

This:

$html->getElementByTagName('span')->getElementById($id);

makes no sense. getElementByTagName returns a DOMList, which does not have a getElementById method.

I think you mean $html->getElementById($id);, but I can't be sure because I don't know what $html is.

Once you have the element, you can get the text value with $element->textContent if you don't need to walk among the text nodes.

Have you considered using DOMXPath for your parsing task? It's probably much easier and clearer.

回复收藏 0 原文

初与友歌 2024-12-25 10:23:24

除非您使用 Php 版本 <= 4，否则应避免使用简单的 Html Dom。Php5 中的内置 Dom 函数使用更可靠的 libxml2 库。

迭代 html 的正确方法是首先识别要迭代的行，然后编写 xpath 表达式来提取与该行相关的数据。

$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DomXpath($dom);

foreach($xpath->query("//tr[@class='sectionRow']") as $row){
    echo $xpath->query(".//span[contains(@id,'Days')]",$row)->item(0)->nodeValue."\n";
    echo $xpath->query(".//span[contains(@id,'Room')]",$row)->item(0)->nodeValue."\n";
    echo $xpath->query(".//span[contains(@id,'Status')]",$row)->item(0)->nodeValue."\n";
}

Simple Html Dom should be avoided unless you're using Php version <= 4. The built in Dom functions in Php5 use the much more reliable libxml2 library.

The proper way to iterate that html is to first identify the rows to iterate and then write xpath expressions to pull the data relative to that row.

$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DomXpath($dom);

foreach($xpath->query("//tr[@class='sectionRow']") as $row){
    echo $xpath->query(".//span[contains(@id,'Days')]",$row)->item(0)->nodeValue."\n";
    echo $xpath->query(".//span[contains(@id,'Room')]",$row)->item(0)->nodeValue."\n";
    echo $xpath->query(".//span[contains(@id,'Status')]",$row)->item(0)->nodeValue."\n";
}

回复收藏 0 原文

~没有更多了~