PHP 抓取页面
我正在尝试抓取我要查找的信息所在的页面:
<tr class="defRowEven">
<td align="right">label</td>
<td>info</td>
</tr>
我正在尝试从页面中获取标签和信息。在我做类似的事情之前:
$hrefs = $xpath->evaluate("/html/body//a");
这就是我获取 URL 的方式。有没有办法获取 tr 信息?使用正则表达式或使用 DOMXPath
会更好吗?我对 DOMXPath
非常不熟悉,任何信息都会非常有帮助。谢谢你!
I'm trying to scrape a page where the information I'm looking for lies within:
<tr class="defRowEven">
<td align="right">label</td>
<td>info</td>
</tr>
I'm trying to get the label and info out of the page. Before I was doing something like:
$hrefs = $xpath->evaluate("/html/body//a");
That is how I'm grabbing the URL's. Is there a way to grab that tr
information? Would it be better to use regex or using the DOMXPath
? I'm very unfamiliar with DOMXPath
and any information would be more than helpful. Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
XPath 可以根据属性进行选择。要找到您的行,然后使用:
这应该返回行列表,以便您可以选择每个行的标签和信息,而无需将它们混合起来:
如果这不起作用,您可以尝试正则表达式路线:
XPath can select based on attributes. To find your row, then, use:
This should return a list of rows, so you can select the label and info for each without mixing them up:
In case that doesn't work out, you can try the regex route:
我不熟悉 xpath,但使用 SimpleHtmlDom 你可以这样做:
I'm not familiar with xpath, but using SimpleHtmlDom you can do this:
最近有人在 SO 提供了一个链接 phpQuery .. 一种用于 php/ 的 jQuery服务器端..这应该使这件事变得简单。我还没有尝试过,所以无法发表第一手评论
Someone here recently at SO gave a link to phpQuery .. a kind of jQuery for php/server-side .. which SHOULD make this kinda thing easy. I've not tried it so can't comment first hand