PHP:每当我尝试编写 UTF-8 时,使用 DOMDocument 它会写入它的十六进制表示法
当我尝试使用 DOMDocument 将 UTF-8 字符串写入 XML 文件时,它实际上会写入字符串的十六进制表示法,而不是字符串本身。
例如:
ירושלים
而不是:
ירושלים
有什么想法如何解决这个问题吗?
When I try to write UTF-8 Strings into an XML file using DOMDocument it actually writes the hexadecimal notation of the string instead of the string itself.
for example:
ירושלים
instead of:
ירושלים
Any ideas how to resolve the issue?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
好的,就这样:
会很好地工作,因为在这种情况下,您构建的文档将保留指定为第二个参数的编码:
但是,一旦将 XML 加载到未指定编码的文档中,您将丢失任何内容在构造函数中声明,这意味着:
不会有 utf-8 编码:
因此,如果您加载 XML 某些内容,请确保它是,
并且它将按预期工作。
作为替代方案,您还可以指定编码 加载文档后。
Ok, here you go:
will work fine, because in this case, the document you constructed will retain the encoding specified as the second argument:
However, once you load XML into a Document that does not specify an encoding, you will lose anything you declared in the constructor, which means:
will not have an encoding of utf-8:
So if you loadXML something, make sure it is
and it will work as expected.
As an alternative, you can also specify the encoding after loading the document.
如果你想用 DOMDocument 输出 UTF-8,你需要指定。很简单,不是吗?如果你已经闻到了一个棘手的问题,那么你就离题不远了,但乍一看,它确实很简单。
考虑以下输出十六进制实体的(UTF-8 编码)代码示例:
输出:
如前所述,如果要将其输出为 UTF-8,则需要指定它,并且它很简单:
然后的输出是UTF-8 明确:
简单的部分就讲这么多。如果您对肮脏的小细节感兴趣,您可以继续阅读 - 如果没有,请不要问“为什么?”:)。
我刚刚写了“用 UTF-8 明确地”,因为在第一个示例中,输出也是 UTF-8 编码的,XML 仅包含完全有效的十六进制实体 - 即使以 UTF-8 格式!
您已经注意到我在这里开始挑剔,但请记住: UTF-8 是 XML 的默认编码。
如果您现在开始说:嘿等等,如果默认编码无论如何都是 UTF-8,为什么 PHP DOMDocument 首先使用实体?
事实是,它与问题中的发现并不矛盾。并非总是。
请参阅以下示例,该示例使用 XML 注释而不是包含 Ivrit 字母的节点值:
输出:
好的,都清楚了吗?因此,这里肮脏的小秘密是:无论您是否拥有这些 XML 实体,对于文档来说都没有什么区别,它只是编写相同 XML 字符数据的不同形式。您已经感受到了邀请:让我们尝试 CDATA 来代替第一个示例:
输出:
正如前面的 XML 注释示例所示,这里没有使用 XML 实体。好吧,它们无论如何都不会有效,就像 XML 注释示例一样。
作为概述,让我们创建一个包含所有这些内容的示例:
输出:
经验教训:
希望就是这样。
[1] 可能如果您从 HTTP 请求加载并提供流上下文并通过元数据标记字符编码 - 但这应该首先进行测试,我不知道。 BOM 不起作用在某种程度上表明所有这些东西都不起作用。
If you want to output UTF-8 with DOMDocument, you need to specify that. Simple, isn't it? If you already smell a trick question, you're not too far off, but on first sight, it really is straight forward.
Consider the following (UTF-8 encoded) code-example that outputs hexadecimal entities:
Output:
As written, if you want to output this as UTF-8, you need to specify it, and it is straight forward:
The output then is in UTF-8 explicitly:
So much for the straight forward part. If you are interested in the dirty little details, you are free to read on - if not, please do not ask "why?" :).
I just wrote "in UTF-8 explicitly" because also in the first example the output is UTF-8 encoded, the XML just contained hexadecimal entities which is perfectly valid - even in UTF-8!
You already notice that I start with nit-picking here, but remember: UTF-8 is the default encoding of XML.
And if you now start to say: Hey wait, if the default encoding is UTF-8 anyway, why does PHPs DOMDocument use the entities in the first place?
Well the truth is, it does not contrary to the finding in the question. Not always.
See the following example which is using an XML-comment instead of a node value containing the Ivrit letters:
Output:
Okay, all clear? So the dirty little secret here is: Whether you've got those XML entities in there or not - for the document it does not make a difference, it is just a different form of writing the same XML character data. And you already feel invited: Lets try CDATA instead for the first example:
Output:
As this demonstrates like with the XML-comment example before, there are no XML entities used here. Well, they would not be valid anyway, like with the XML-comment example.
For the overview lets create an example that contains all these:
Output:
Lessons learned:
And that's it hopefully.
[1] Probably if you load from a HTTP request and you provide stream context and flag the character encoding via meta-data - but this should be tested first, I do not know. That the BOM does not work is somewhat a sign that all these things do not work.
显然,将 documentElement 作为 $node 传递给 saveXML 可以解决此问题,尽管我不能说我理解原因。
例如,
而不是:
来源:http://www.php.net/手册/en/domdocument.savexml.php#88525
Apparently passing the documentElement as $node to saveXML works around this, although I can't say I understand why.
e.g.
rather than:
Source: http://www.php.net/manual/en/domdocument.savexml.php#88525
就这一点而言,答案是:
当您的函数启动时,在获得内容后,执行以下操作:
然后启动新文档等。以示例为例:
然后执行您打算对代码执行的任何操作。
To the point answer is:
When your function starts, right after you get the content, do this:
And then start the new document etc. Check this as example:
Then do whatever you were intending to do with your code.
当我创建用于写入的 DOMDocument 时,我添加了以下参数:
这些参数导致 UTF-8 字符串按原样写入。
When I created the DOMDocument for writing, i added the following parameters:
these parameters caused the UTF-8 string to be written as is.