PHP 函数可将任意“描述”转换为转换为播客提要的有效 xml 数据
我正在阅读有关创建播客提要适合 iTunes 的文档,并且常见错误部分说:
使用 HTML 命名字符实体。
<! — illegal xml — >
<copyright>© 2005 John Doe</copyright>
<! — valid xml — >
<copyright>© 2005 John Doe</copyright>
与 HTML 不同, XML 仅支持五种 “命名字符实体”:
character name xml
& ampersand &
< less-than sign <
> greater-than sign >
’ apostrophe '
" quotation "
以上五个字符是唯一的 需要转义的字符 XML。所有其他字符都可以 直接在编辑器中输入 支持UTF-8。您还可以使用 数字字符引用 指定字符的 Unicode, 例如:
character name xml
© copyright sign ©
℗ sound recording copyright ℗
™ trade mark sign ™
有关进一步参考,请参阅 XML 角色和实体引用。
现在我在 PHP5 下使用 htmlentities()
并且 feed 正在验证和工作。但根据我收集的信息,一些可以放入内容中的东西可能会成为使其不再有效的实体。确保我不会传递错误数据的最佳函数是什么?我很担心某些内容会被输入并被实体化并破坏提要 - 我是否应该使用 str_replace()
并替换为命名实体并保留其余部分?或者我可以以某种方式使用 htmlspecialchars() 吗?
简而言之,什么是 htmentities()
的直接替代品,可以确保播客 RSS 提要中的描述、标题等输入是安全的?
I am reading the documentation for creating a podcast feed suitable for iTunes, and the Common Mistakes section says:
Using HTML Named Character Entities.
<! — illegal xml — >
<copyright>© 2005 John Doe</copyright>
<! — valid xml — >
<copyright>© 2005 John Doe</copyright>
Unlike HTML, XML supports only five
"named character entities":
character name xml
& ampersand &
< less-than sign <
> greater-than sign >
’ apostrophe '
" quotation "
The five characters above are the only
characters that require escaping in
XML. All other characters can be
entered directly in an editor that
supports UTF-8. You can also use
numeric character references that
specify the Unicode for the character,
for example:
character name xml
© copyright sign ©
℗ sound recording copyright ℗
™ trade mark sign ™
For further reference see XML
Character and EntityReferences.
Right now I'm using htmlentities()
under PHP5 and the feed is validating and working. But from what I gather some things that could get put into content might become entities that would make it no longer be valid. What's the best function to use to assure I'm not passing along bad data? I'm paranoid something will get entered and get entity-ized and break the feed -- should I just use str_replace()
and replace with named entities and leave the rest alone? Or can I use htmlspecialchars()
somehow?
So in short, what's a drop-in replacement for htmentities()
that will make sure input is safe for description, titles, etc in a podcast RSS feed?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以:
]]>
,它不能按字面意思放在 CDATA 块中。mb_encode_numericentity
而不是htmlentities
(可能与htmlspecialchars
以及之前使用mb_convert_encoding
对 html 实体进行解码相结合)。如果 XML 文件的编码是 UTF-8,则只需删除实体即可。假设您有以下 HTML 片段:
那么,您可以这样做:
You can either:
]]>
, which cannot be put literally in a CDATA block.mb_encode_numericentity
instead ofhtmlentities
(possibly combined withhtmlspecialchars
and a previous decoding of html entites withmb_convert_encoding
).If the encoding of the XML file is UTF-8, you can just remove the entities. Suppose you have the following HTML fragment:
Then, you could just do: