PHP DOMDocument 未正确格式化输出
我目前正在为一个网站制作站点地图,并使用 SimpleXML 导入原始 XML 文件并对其进行一些检查。之后,我使用 simplexml_load_file("small.xml");
将其转换为 DOMDocument,以便更轻松地精确添加和操作 XML 元素。下面是我正在使用的测试 XML 站点地图:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:52:32-Orouke.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:23-castle technology.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:38-banana split.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:42-Waveney.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:55:12-pure orange.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:57:54-tau press.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:21-E.f.m.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:31-apple.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:45-townhouse communications.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
</urlset>
现在。这是我用来修改的测试代码:
<?php
$root = simplexml_load_file("small.xml");
$domRoot = dom_import_simplexml($root);
$dom = $domRoot->ownerDocument;
$urlElement = $dom->createElement("url");
$locElement = $dom->createElement("loc");
$locElement->appendChild($dom->createTextNode("www.google.co.uk"));
$urlElement->appendChild($locElement);
$lastmodElement = $dom->createElement("lastmod");
$lastmodElement->appendChild($dom->createTextNode("2011-08-02"));
$urlElement->appendChild($lastmodElement);
$domRoot->appendChild($urlElement);
$dom->formatOutput = true;
echo $dom->saveXML();
?>
主要问题是,无论我将 $dom->formatOutput = true;
放置在哪里,从 SimpleXML 导入的现有 XML 的格式都是正确的,但任何新内容都以“全一行”样式进行格式化,如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:52:32-Orouke.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:23-castle technology.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:38-banana split.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:42-Waveney.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:55:12-pure orange.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:57:54-tau press.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:21-E.f.m.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:31-apple.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:45-townhouse communications.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url><loc>www.google.co.uk</loc><lastmod>2011-08-02</lastmod></url></urlset>
如果有人知道为什么会发生这种情况以及如何解决它,我将非常感激。
I'm currently working on the sitemaps for a website, and I'm using SimpleXML to import and do some checks on the original XML file. after this I use simplexml_load_file("small.xml");
to convert it to DOMDocument to make it easier to precisely add and manipulate XML elements. Below is the test XML sitemap that i'm working from:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:52:32-Orouke.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:23-castle technology.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:38-banana split.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:42-Waveney.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:55:12-pure orange.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:57:54-tau press.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:21-E.f.m.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:31-apple.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:45-townhouse communications.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
</urlset>
Now. here is the test code I'm using to modify:
<?php
$root = simplexml_load_file("small.xml");
$domRoot = dom_import_simplexml($root);
$dom = $domRoot->ownerDocument;
$urlElement = $dom->createElement("url");
$locElement = $dom->createElement("loc");
$locElement->appendChild($dom->createTextNode("www.google.co.uk"));
$urlElement->appendChild($locElement);
$lastmodElement = $dom->createElement("lastmod");
$lastmodElement->appendChild($dom->createTextNode("2011-08-02"));
$urlElement->appendChild($lastmodElement);
$domRoot->appendChild($urlElement);
$dom->formatOutput = true;
echo $dom->saveXML();
?>
The main problem is, that no matter where i place $dom->formatOutput = true;
the existing XML that was imported from SimpleXML is formatted correctly, but anything new is formatted in the "all one line" style, as follows:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:52:32-Orouke.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:23-castle technology.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:38-banana split.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:42-Waveney.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:55:12-pure orange.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:57:54-tau press.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:21-E.f.m.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:31-apple.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:45-townhouse communications.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url><loc>www.google.co.uk</loc><lastmod>2011-08-02</lastmod></url></urlset>
If anyone has an idea why this is happening and how to fix it I would be very grateful.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
有一个解决方法。您可以通过先将新的 xml 保存为字符串来强制重新格式化,然后在设置 formatOutput 属性后再次加载它,例如:
There is a workaround. You can force reformatting by saving your new xml to string first, then load it again after setting the formatOutput property, e.g.:
为了很好地格式化输出,您需要在加载之前将
preserveWhiteSpace
变量设置为false
,如 文档示例:
仅供来到此处的访问者使用,因为这是 Google 搜索上的第一个答案。
To format output nicely, you need to set the
preserveWhiteSpace
variable tofalse
before loading as stated in the documentationExample:
Just for the visitor that comes here as this was the first answer on Google Search.
我使用像西蒙这样的代码也遇到了同样的问题。
事实证明,当您禁用错误时(使用
$doc->loadHTML(..., LIBXML_NOERROR)
或libxml_use_internal_errors(true);
),它不会格式化不再(例如:https://3v4l.org/ur76E)。解决方案是不禁用错误并在 PHP 端抑制它们(使用
@
)。丑陋,但它有效: https://3v4l.org/BSJVu
最终的银弹函数如下所示
:(它还负责处理 php 错误处理程序(如果已设置)
I had this same problem using code like Simon's.
Turns out that when you disable errors (either with
$doc->loadHTML(..., LIBXML_NOERROR)
orlibxml_use_internal_errors(true);
), it won't format anymore (example: https://3v4l.org/ur76E).The solution is to not disable errors and suppress them on the PHP side (with
@
).Ugly, but it works: https://3v4l.org/BSJVu
The final silver bullet function looks like:
(it also takes care of the php error handler, if already set)