PHP 抓取页面

发布于 2024-08-04 03:56:43 字数 403 浏览 5 评论 0原文

我正在尝试抓取我要查找的信息所在的页面:

 <tr class="defRowEven">
   <td align="right">label</td>
   <td>info</td>
 </tr>

我正在尝试从页面中获取标签和信息。在我做类似的事情之前:

$hrefs = $xpath->evaluate("/html/body//a");

这就是我获取 URL 的方式。有没有办法获取 tr 信息?使用正则表达式或使用 DOMXPath 会更好吗?我对 DOMXPath 非常不熟悉,任何信息都会非常有帮助。谢谢你!

I'm trying to scrape a page where the information I'm looking for lies within:

 <tr class="defRowEven">
   <td align="right">label</td>
   <td>info</td>
 </tr>

I'm trying to get the label and info out of the page. Before I was doing something like:

$hrefs = $xpath->evaluate("/html/body//a");

That is how I'm grabbing the URL's. Is there a way to grab that tr information? Would it be better to use regex or using the DOMXPath? I'm very unfamiliar with DOMXPath and any information would be more than helpful. Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

被你宠の有点坏 2024-08-11 03:56:43

XPath 可以根据属性进行选择。要找到您的行,然后使用:

$rows = $xpath->query("//tr[@class='defRowEven']");

这应该返回行列表,以便您可以选择每个行的标签和信息,而无需将它们混合起来:

foreach ($rows as $row) {
    $label = $xpath->evaluate("td[@align='right']", $row);
    $info = $xpath->evaluate("td[2]", $row);
}

如果这不起作用,您可以尝试正则表达式路线:

preg_match_all('/<tr class="defRowEven">\s*<td align="right">(.*?)<\/td>\s*<td>(.*?)<\/td>/',
    $html, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
    list($full, $label, $info) = $match;
}

XPath can select based on attributes. To find your row, then, use:

$rows = $xpath->query("//tr[@class='defRowEven']");

This should return a list of rows, so you can select the label and info for each without mixing them up:

foreach ($rows as $row) {
    $label = $xpath->evaluate("td[@align='right']", $row);
    $info = $xpath->evaluate("td[2]", $row);
}

In case that doesn't work out, you can try the regex route:

preg_match_all('/<tr class="defRowEven">\s*<td align="right">(.*?)<\/td>\s*<td>(.*?)<\/td>/',
    $html, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
    list($full, $label, $info) = $match;
}
黎歌 2024-08-11 03:56:43

我不熟悉 xpath,但使用 SimpleHtmlDom 你可以这样做:

foreach($html->find('tr.defRowEven') as $row) {

    //get the 'label' (first cell)
    echo $row->find('td', 0)->innerText;

    //get the 'info' (second cell)
    echo $row->find('td', 1)->innerText;
}

I'm not familiar with xpath, but using SimpleHtmlDom you can do this:

foreach($html->find('tr.defRowEven') as $row) {

    //get the 'label' (first cell)
    echo $row->find('td', 0)->innerText;

    //get the 'info' (second cell)
    echo $row->find('td', 1)->innerText;
}
梦魇绽荼蘼 2024-08-11 03:56:43

最近有人在 SO 提供了一个链接 phpQuery .. 一种用于 php/ 的 jQuery服务器端..这应该使这件事变得简单。我还没有尝试过,所以无法发表第一手评论

Someone here recently at SO gave a link to phpQuery .. a kind of jQuery for php/server-side .. which SHOULD make this kinda thing easy. I've not tried it so can't comment first hand

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文