如何从 xml 文件中删除非法字符?

发布于 2024-08-15 15:51:03 字数 880 浏览 1 评论 0原文

我正在使用 PHP SimpleXML 方式处理服务器上的 XML 文件。我只需要读取 XML 的内容(我不需要修改它),所以我坚持使用简单易用的 SimpleXML。但是 SimpleXML 在读取某个 XML 文件时遇到问题,因为它有一些非常奇怪的字符。我收到以下错误:

Warning: simplexml_load_file() [function.simplexml-load-file]: data/data.xml:348: parser error : PCDATA invalid Char value 3 in C:\xampp\htdocs\VMP\xintel\analyzer.php on line 54

Warning: simplexml_load_file() [function.simplexml-load-file]: Jardin al fte. Hall de recepcion, amplio living comedor. ocina comedor diario c in C:\xampp\htdocs\VMP\xintel\analyzer.php on line 54

我无法控制 XML 文件中的内容,因此无法阻止将这些字符添加到文件中。另外,我不知道如何解决这个问题。该文件应该以 utf-8 编码。所以我尝试了从 UTF-8 解码为 ISO-8859-1 以及相反的解码,但没有任何反应。

有人可以帮我吗?我应该尝试更改编码吗?我应该尝试删除这些字符吗?任何事物?

编辑: tangre 字符都是方框图字符(请参阅:http://en.wikipedia。 org/wiki/Box-drawing_characters

I am using the PHP SimpleXML way of working with XML files on my server. I only need to read the contents of the XML (I have no need to modify it) so I stuck to the simple and easy to use SimpleXML. But SimpleXML is having problems reading a certain XML file because it has some very strange characters. I get the following errors:

Warning: simplexml_load_file() [function.simplexml-load-file]: data/data.xml:348: parser error : PCDATA invalid Char value 3 in C:\xampp\htdocs\VMP\xintel\analyzer.php on line 54

Warning: simplexml_load_file() [function.simplexml-load-file]: Jardin al fte. Hall de recepcion, amplio living comedor. ocina comedor diario c in C:\xampp\htdocs\VMP\xintel\analyzer.php on line 54

I have no control of what goes into the XML file, so I can't stop these characters from being added to the file. Also, I don't know how to solve this issue. The file is supposed to be encoded in utf-8. So I tried things like decoding from UTF-8 to ISO-8859-1 and the reverse, but nothing is happening.

Can somebody help me out? Should I try to change the encoding? Should I try to remove those characters? Anything?

Edit: The stangre characters are all box-drawing characters (see: http://en.wikipedia.org/wiki/Box-drawing_characters)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

丶视觉 2024-08-22 15:51:03

我有一个应用程序从不受信任的来源接收 XML,其中许多来源向我发送未编码的 & 符号。为了解决这个问题,我有一个中间过滤器,它执行单个线性传递并在必要时删除/编码字符。我不知道这对你来说是否可行,但我认为这是一个非常合理的解决方案。

I have an app that receives XML from untrusted sources, many of which send me unencoded ampersands. To solve the problem, I have an intermediate filter that does a single linear pass and gets rid of / encodes characters where necessary. I don't know if that is possible for you but I think it's a pretty reasonable solution.

凯凯我们等你回来 2024-08-22 15:51:03

也许您可以通过 Tidy 传递输入以使其格式良好。在将文件提供给 SimpleXML 之前进行一个简单的预处理步骤。

例如, tidy::repairFile看起来很有希望。

Maybe you could pass the input through Tidy to make it well-formed. One simple step of pre-processing before you feed the file to SimpleXML.

For example, tidy::repairFile looks promising.

朦胧时间 2024-08-22 15:51:03

通常,XML 文件的所有字符都会被解释,除非它们位于 CDATA 部分 => 链接文本

如果不是这样,您的 XML 无效。

Normally all character of an XML file are interpreted unless they are into a CDATA section => link text

If it not the case your XML is invalid.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文