使用 PHP DOMDocument 连接 HTML 表格

发布于 2024-10-15 20:27:45 字数 694 浏览 7 评论 0原文

我有一大堆大型 HTML 文档，其中包含数据表，我正在编写一个脚本来处理 HTML 文件，隔离标签及其内容，然后将这些表中的所有行连接到一个大型数据表中。然后循环遍历新大表的行和列。

经过一番研究后，我开始尝试使用 PHP 的 DOMDocument 类来解析 HTML，但我只是想知道，这是执行此类操作的最佳方法吗？

这就是我到目前为止所得到的...

$dom = new DOMDocument();
$dom->preserveWhiteSpace = FALSE;
@$dom->loadHTMLFile('exrate.html');
$tables = $dom->getElementsByTagName('table');

我如何删除除表格及其内容之外的所有内容？然后我实际上想删除第一个表，因为它是目录。然后循环遍历所有表行并将它们构建成一张大表。

有人对如何执行此操作有任何提示吗？我一直在 php.net 上挖掘 DOMDocument 的文档，但我发现语法非常令人困惑！

干杯，B

编辑：这是一个 HTML 文件示例，其中包含我想要加入的数据表 http://thenetzone.co.uk/exrates/exrate.html

原文

I have a whole bunch of large HTML documents with tables of data inside and I'm looking to write a script which can process an HTML file, isolate the tags and their contents, then concatenate all the rows within those tables into one large data table.
Then loop through the rows and columns of the new large table.

After some research I've started trying out PHP's DOMDocument class to parse the HTML but I just wanted to know, is that the best way to do something like this?

This is what I've got so far...

$dom = new DOMDocument();
$dom->preserveWhiteSpace = FALSE;
@$dom->loadHTMLFile('exrate.html');
$tables = $dom->getElementsByTagName('table');

How do I chop out everything other than the tables and their contents?
Then I'd actually like to remove the first table since it's a table of contents. Then loop through all the table rows and build them into one large table.

Anyone got any hints on how to do this?
I've been digging through the docs for DOMDocument on php.net but I'm finding the syntax pretty baffling!

Cheers, B

EDIT: Here is a sample of an HTML file with the data tables I'd like to join http://thenetzone.co.uk/exrates/exrate.html

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蝶…霜飞 2024-10-22 20:27:45

好的，用 phpQuery 进行了排序，并进行了大量的试验和错误。
因此，它需要一大堆表并将内容移到第一个表中，删除空表。
然后循环遍历每个表行并从特定列中提取文本，在本例中为每行的第二个和第三个 td。

require('phpQuery/phpQuery.php');
$doc = phpQuery::newDocumentFileHTML('exrates_code.html');
pq('table:first')->remove();// REMOVE FIRST TABLE, JUST A CONTENTS TABLE SO NOT INTERESTED
pq('tr:has(th)')->remove();// REMOVE TABLE ROWS THAT ARE HEADERS
pq('table:not(:first) tr')->appendTo('table:first');// MOVE CONTENTS OF OTHER TABLES TO FIRST
pq('table:empty')->remove();// REMOVE EMPTY TABLES
pq('br')->remove();

$rows = pq('table tr');
foreach ($rows as $row) {
    $currency = pq($row)->find('td:eq(1)')->text();
    $value = pq($row)->find('td:eq(2)')->text();
}

希望这可以帮助别人！

Ok got it sorted with phpQuery and lots of trial and error.
So it takes a whole bunch of tables and moves the contents into the first one, removes the empty tables.
Then loops through each table row and extracts the text from specific columns, in this case the 2nd and 3rd td of each row.

require('phpQuery/phpQuery.php');
$doc = phpQuery::newDocumentFileHTML('exrates_code.html');
pq('table:first')->remove();// REMOVE FIRST TABLE, JUST A CONTENTS TABLE SO NOT INTERESTED
pq('tr:has(th)')->remove();// REMOVE TABLE ROWS THAT ARE HEADERS
pq('table:not(:first) tr')->appendTo('table:first');// MOVE CONTENTS OF OTHER TABLES TO FIRST
pq('table:empty')->remove();// REMOVE EMPTY TABLES
pq('br')->remove();

$rows = pq('table tr');
foreach ($rows as $row) {
    $currency = pq($row)->find('td:eq(1)')->text();
    $value = pq($row)->find('td:eq(2)')->text();
}

Hope this helps someone out!

回复收藏 0 原文

~没有更多了~