PHP DOM 解析器从 Span 获取数据
我正在尝试使用 DOM 从以下一批 HTML 中获取日期和时间以及房间(我实际上获取了我的脚本中的所有内容,但它获取了这两个我遇到了麻烦):
</td><td class="call">
<span>12549<br/></span><a href="http://www.bkstr.com/webapp/wcs/stores/servlet/booklookServlet?bookstore_id-1=584&term_id-1=201190&crn-1=12549" target="_blank">View Book Info</a>
</td><td>
<span id="ctl10_gv_sectionTable_ctl03_lblDays">F:1000AM - 1125AM<br />T:230PM - 355PM</span>
</td><td class="room">
<span id="ctl10_gv_sectionTable_ctl03_lblRoom">KUPF106<br />KUPF106</span>
</td><td class="status"><span id="ctl10_gv_sectionTable_ctl03_lblStatus" class="red">Closed</span></td><td class="max">20</td><td class="now">49</td><td class="instructor">
<a href="https://directory.njit.edu/PersDetails.aspx?persid=SCHOENKA" target="_blank">Schoenebeck Kar</a>
</td><td class="credits">3.00</td>
</tr><tr class="sectionRow">
<td class="section">
101<br />
这是我迄今为止找到的内容 正如您从 HTML 中看到的那样
$tracker =0;
// DAYS AND TIMES
$number = 3;
$digit = "0";
while($tracker<$numSections){
$strNum = strval($number);
$zero = strval($digit);
$start = "ctl10_gv_sectionTable_ctl";
$end = "_lblDays";
$id = $start.$zero.$strNum.$end;
//$days = $html->find('span.$id');
$days=$html->getElementByTagName('span')->getElementById($id);
echo "Days : ";
echo $days[0] . '<br>';
$tracker++;
$number++;
if($number >9){
$digit = "1";
$number=0;
}
}
,正在解析的网站的某些跨度具有非常独特的 ID (ctl10_gv_sectionTable_ctl03_lblRoom)。由于我只发布了 1 个部分的 HTML 块,所以您看不到的是,下一个类部分的代码是相同的,除了“ctl03”部分,这是我所处理的所有额外代码,所以没有一个人被它甩掉了。
我尝试了几种不同的方法,但似乎无法获取日期(即“1000AM - 1125AM”)或房间(即 KUPF106)。其余的东西很容易获取,但是这两个没有类标识符,甚至没有 td 标识符。我想我只需要知道如何使用 $id 中的值作为我正在寻找的特定范围 id ?如果是这样,有人可以告诉我该怎么做吗?
I am trying to use DOM to get the days and times and also the rooms (im actually getting everything in my script but its getting these two im having trouble with) from the following batch of HTML:
</td><td class="call">
<span>12549<br/></span><a href="http://www.bkstr.com/webapp/wcs/stores/servlet/booklookServlet?bookstore_id-1=584&term_id-1=201190&crn-1=12549" target="_blank">View Book Info</a>
</td><td>
<span id="ctl10_gv_sectionTable_ctl03_lblDays">F:1000AM - 1125AM<br />T:230PM - 355PM</span>
</td><td class="room">
<span id="ctl10_gv_sectionTable_ctl03_lblRoom">KUPF106<br />KUPF106</span>
</td><td class="status"><span id="ctl10_gv_sectionTable_ctl03_lblStatus" class="red">Closed</span></td><td class="max">20</td><td class="now">49</td><td class="instructor">
<a href="https://directory.njit.edu/PersDetails.aspx?persid=SCHOENKA" target="_blank">Schoenebeck Kar</a>
</td><td class="credits">3.00</td>
</tr><tr class="sectionRow">
<td class="section">
101<br />
Here is what I have so far for finding days
$tracker =0;
// DAYS AND TIMES
$number = 3;
$digit = "0";
while($tracker<$numSections){
$strNum = strval($number);
$zero = strval($digit);
$start = "ctl10_gv_sectionTable_ctl";
$end = "_lblDays";
$id = $start.$zero.$strNum.$end;
//$days = $html->find('span.$id');
$days=$html->getElementByTagName('span')->getElementById($id);
echo "Days : ";
echo $days[0] . '<br>';
$tracker++;
$number++;
if($number >9){
$digit = "1";
$number=0;
}
}
as you can see from the HTML, the site im parsing has pretty unique ID's for some of its spans (ctl10_gv_sectionTable_ctl03_lblRoom). As I only posted 1 section's HTML block, what you don't see is that the code for the next class section is identical except for the "ctl03" part, which is what all the extra code I have takes care of, just so no one is thrown off by it.
I've tried a few different ways but can not seem to get the days (i.e. "1000AM - 1125AM") or the rooms (i.e. KUPF106). The rest of the stuff is pretty simple to grab but these two don't have class identifiers or even a td identifier. I think I just need to know how to use the value I have in $id as the specific span id I am looking for? If so can someone show me how to do that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这:
毫无意义。
getElementByTagName
返回一个DOMList
,它没有getElementById
方法。我认为您的意思是
$html->getElementById($id);
,但我无法确定,因为我不知道$html
是什么。获得元素后,如果不需要在文本节点之间行走,则可以使用
$element->textContent
获取文本值。您是否考虑过使用
DOMXPath
来执行解析任务?它可能更更容易和更清晰。This:
makes no sense.
getElementByTagName
returns aDOMList
, which does not have agetElementById
method.I think you mean
$html->getElementById($id);
, but I can't be sure because I don't know what$html
is.Once you have the element, you can get the text value with
$element->textContent
if you don't need to walk among the text nodes.Have you considered using
DOMXPath
for your parsing task? It's probably much easier and clearer.除非您使用 Php 版本 <= 4,否则应避免使用简单的 Html Dom。Php5 中的内置 Dom 函数使用更可靠的 libxml2 库。
迭代 html 的正确方法是首先识别要迭代的行,然后编写 xpath 表达式来提取与该行相关的数据。
Simple Html Dom should be avoided unless you're using Php version <= 4. The built in Dom functions in Php5 use the much more reliable libxml2 library.
The proper way to iterate that html is to first identify the rows to iterate and then write xpath expressions to pull the data relative to that row.