使用 PHP DOMDocument 连接 HTML 表格
我有一大堆大型 HTML 文档,其中包含数据表,我正在编写一个脚本来处理 HTML 文件,隔离标签及其内容,然后将这些表中的所有行连接到一个大型数据表中。 然后循环遍历新大表的行和列。
经过一番研究后,我开始尝试使用 PHP 的 DOMDocument 类来解析 HTML,但我只是想知道,这是执行此类操作的最佳方法吗?
这就是我到目前为止所得到的...
$dom = new DOMDocument();
$dom->preserveWhiteSpace = FALSE;
@$dom->loadHTMLFile('exrate.html');
$tables = $dom->getElementsByTagName('table');
我如何删除除表格及其内容之外的所有内容? 然后我实际上想删除第一个表,因为它是目录。然后循环遍历所有表行并将它们构建成一张大表。
有人对如何执行此操作有任何提示吗? 我一直在 php.net 上挖掘 DOMDocument 的文档,但我发现语法非常令人困惑!
干杯,B
编辑:这是一个 HTML 文件示例,其中包含我想要加入的数据表 http://thenetzone.co.uk/exrates/exrate.html
I have a whole bunch of large HTML documents with tables of data inside and I'm looking to write a script which can process an HTML file, isolate the tags and their contents, then concatenate all the rows within those tables into one large data table.
Then loop through the rows and columns of the new large table.
After some research I've started trying out PHP's DOMDocument class to parse the HTML but I just wanted to know, is that the best way to do something like this?
This is what I've got so far...
$dom = new DOMDocument();
$dom->preserveWhiteSpace = FALSE;
@$dom->loadHTMLFile('exrate.html');
$tables = $dom->getElementsByTagName('table');
How do I chop out everything other than the tables and their contents?
Then I'd actually like to remove the first table since it's a table of contents. Then loop through all the table rows and build them into one large table.
Anyone got any hints on how to do this?
I've been digging through the docs for DOMDocument on php.net but I'm finding the syntax pretty baffling!
Cheers, B
EDIT: Here is a sample of an HTML file with the data tables I'd like to join http://thenetzone.co.uk/exrates/exrate.html
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好的,用 phpQuery 进行了排序,并进行了大量的试验和错误。
因此,它需要一大堆表并将内容移到第一个表中,删除空表。
然后循环遍历每个表行并从特定列中提取文本,在本例中为每行的第二个和第三个 td。
希望这可以帮助别人!
Ok got it sorted with phpQuery and lots of trial and error.
So it takes a whole bunch of tables and moves the contents into the first one, removes the empty tables.
Then loops through each table row and extracts the text from specific columns, in this case the 2nd and 3rd td of each row.
Hope this helps someone out!