如何通过 PHP 处理 XML 输出中的水平省略号(三个点)字符
正如问题中提到的,我正在尝试使用 PHP 生成 XML 输出(用于 iPhone 应用程序),PHP 从 MySQL 的文本字段读取数据。
每当字段中有水平省略号字符时... XML 都不会正确生成。
我尝试了几种方法来逃避它,如下所示,但似乎都不起作用...
$row['detail'] = str_replace("&", "&", $row['detail']);
$row['detail'] = str_replace("…", "…", $row['detail']); //<-- prob is here
$row['detail'] = str_replace("<", "<", $row['detail']);
$row['detail'] = str_replace("\'", "'", $row['detail']);
$row['detail'] = str_replace(">", ">", $row['detail']);
$row['detail'] = str_replace("\"", """, $row['detail']);
我基本上有两个问题,
如何处理水平省略号?
还有更多这样的字符可能会导致这样的问题吗?任何对此列表及其解决方案的引用都会很棒!
谢谢
As mentioned in the question, I am trying to generate an XML output( for an iPhone app) using PHP which is reading the data from MySQL's text field.
Whenever there is a horizontal ellipsis character in the field... the XML is not generated properly.
I have tried a few ways to escape it like shown below, but none seems to work...
$row['detail'] = str_replace("&", "&", $row['detail']);
$row['detail'] = str_replace("…", "…", $row['detail']); //<-- prob is here
$row['detail'] = str_replace("<", "<", $row['detail']);
$row['detail'] = str_replace("\'", "'", $row['detail']);
$row['detail'] = str_replace(">", ">", $row['detail']);
$row['detail'] = str_replace("\"", """, $row['detail']);
I have 2 questions basically,
How do I handle horizontal ellipsis chracter?
Are there more such characters which could cause such problem? Any reference to this list and its solution would be great!
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
原始 XML 不知道除
>
、<
和`&
之外的任何命名实体。所有其他实体需要声明为数字字符代码,否则您需要在 Doctype 或 DTD 中指定实体。…
实体在 HTML DTD 中定义,所有浏览器都能理解它,但在大多数其他 XML DTD 中并未定义它。一般来说,如果您使用 DTD,大多数时候它将是您无法控制的第三方 DTD,因此您无法向其中添加实体。您也不希望将临时实体添加到您自己的 DTD 中。
我也会避免将实体声明放入文档类型标头中。除非您在文档中一遍又一遍地重复相同的实体,否则这些都是不必要的废话,并不会真正增加太多内容。
因此,我的建议是简单地使用数字实体。
因此,您可以使用字符代码实体
…
或…
,而不是…
。这同样适用于任何其他非 ASCII 字符。当然,另一个选项是使用 UTF-8 或 UTF-16 字符编码输出 XML,这根本不需要任何实体。这可能是也可能不是您的选择,但如果可能的话,这可能是最好的选择。
如果您需要查找某个特定字符的数字实体代码,网络上有很多地方可以找到它们的参考。这是来自维基百科的一个: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
希望这有帮助。
Raw XML does not know about any named entities except
>
,<
and`&
. All other entities need to either be declared as numeric character codes, or else you need to specify the entities in the Doctype or DTD.The
…
entity is defined in the HTML DTD, which is understood by all browsers, but it isn't defined in most other XML DTDs.In general, if you're working with a DTD, most of the time it will be a third party DTD that you have no control over, so you can't go adding entities to them. You also don't want to be adding entities ad-hoc to your own DTDs either.
I would avoid putting entity declarations into the doctype header as well. It's unnecessary fluff that doesn't really add much unless you're repeating the same entity over and over in a document.
Therefore my recommendation would be simply to use numeric entities.
So instead of
…
, you would use the character code entity…
or…
. The same would apply for any other non-ascii character.The other option, of course, is to output the XML using UTF-8 or UTF-16 character encoding, which negates the need for any entities at all. That may or may not be an option for you, but if it is possible, it may be the best way to go.
If you have a specific character which you need to find the numeric entity codes for, there are plenty of places on the web to find references for them. Here is the one from Wikipedia: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
Hope that helps.
XML 只能理解少数字符实体,
'"&<>
。任何内容都会导致文档无效。您可以尝试使用以下命令将实体添加到 DTD:XML understands only a few character entities,
'"&<>
. Anything is will cause the document to be invalid. You can try adding the entity to the DTD with可以(也是推荐的方式)在 XML 输出中使用文字、实际字符。不要使用基于 HTML 实体的解决方法 - 这是不必要的。
它对您不起作用的原因可能是因为省略号字符的编码与正在生成的 XML 文件的编码不匹配。
您只需要确保它们匹配即可。例如,如果您要生成 UTF-8 XML 文件,则省略号字符也需要是 UTF-8。
It is possible (and the recommended way) to use the literal, actual character in XML output. Don't use HTML entity based workarounds - it's unnecessary.
The reason why it doesn't work for you is probably because the ellipsis characters's encoding doesn't match the encoding of the XML file that is being generated.
You just need to make sure they match. So for example, if you're generating an UTF-8 XML file, the ellipsis character needs to be UTF-8 as well.