PHP 抓取页面

发布于 2024-08-04 03:56:43 字数 403 浏览 5 评论 0原文

我正在尝试抓取我要查找的信息所在的页面：

 <tr class="defRowEven">
   <td align="right">label</td>
   <td>info</td>
 </tr>

我正在尝试从页面中获取标签和信息。在我做类似的事情之前：

$hrefs = $xpath->evaluate("/html/body//a");

这就是我获取 URL 的方式。有没有办法获取 tr 信息？使用正则表达式或使用 DOMXPath 会更好吗？我对 DOMXPath 非常不熟悉，任何信息都会非常有帮助。谢谢你！

原文

I'm trying to scrape a page where the information I'm looking for lies within:

 <tr class="defRowEven">
   <td align="right">label</td>
   <td>info</td>
 </tr>

I'm trying to get the label and info out of the page. Before I was doing something like:

$hrefs = $xpath->evaluate("/html/body//a");

That is how I'm grabbing the URL's. Is there a way to grab that tr information? Would it be better to use regex or using the DOMXPath? I'm very unfamiliar with DOMXPath and any information would be more than helpful. Thank you!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

被你宠の有点坏 2024-08-11 03:56:43

XPath 可以根据属性进行选择。要找到您的行，然后使用：

$rows = $xpath->query("//tr[@class='defRowEven']");

这应该返回行列表，以便您可以选择每个行的标签和信息，而无需将它们混合起来：

foreach ($rows as $row) {
    $label = $xpath->evaluate("td[@align='right']", $row);
    $info = $xpath->evaluate("td[2]", $row);
}

如果这不起作用，您可以尝试正则表达式路线：

preg_match_all('/<tr class="defRowEven">\s*<td align="right">(.*?)<\/td>\s*<td>(.*?)<\/td>/',
    $html, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
    list($full, $label, $info) = $match;
}

XPath can select based on attributes. To find your row, then, use:

$rows = $xpath->query("//tr[@class='defRowEven']");

This should return a list of rows, so you can select the label and info for each without mixing them up:

foreach ($rows as $row) {
    $label = $xpath->evaluate("td[@align='right']", $row);
    $info = $xpath->evaluate("td[2]", $row);
}

In case that doesn't work out, you can try the regex route:

preg_match_all('/<tr class="defRowEven">\s*<td align="right">(.*?)<\/td>\s*<td>(.*?)<\/td>/',
    $html, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
    list($full, $label, $info) = $match;
}

回复收藏 0 原文

黎歌 2024-08-11 03:56:43

我不熟悉 xpath，但使用 SimpleHtmlDom 你可以这样做：

foreach($html->find('tr.defRowEven') as $row) {

    //get the 'label' (first cell)
    echo $row->find('td', 0)->innerText;

    //get the 'info' (second cell)
    echo $row->find('td', 1)->innerText;
}

I'm not familiar with xpath, but using SimpleHtmlDom you can do this:

foreach($html->find('tr.defRowEven') as $row) {

    //get the 'label' (first cell)
    echo $row->find('td', 0)->innerText;

    //get the 'info' (second cell)
    echo $row->find('td', 1)->innerText;
}

回复收藏 0 原文