PHP 函数可将任意“描述”转换为转换为播客提要的有效 xml 数据

发布于 2024-09-07 13:18:25 字数 1778 浏览 3 评论 0原文

我正在阅读有关创建播客提要适合 iTunes 的文档,并且常见错误部分说:


使用 HTML 命名字符实体。

<! — illegal xml — >
<copyright>&copy; 2005 John Doe</copyright>

<! — valid xml — >
<copyright>&#xA9; 2005 John Doe</copyright>

HTML 不同, XML 仅支持五种 “命名字符实体”:

character   name               xml
&           ampersand          &amp;
<           less-than sign     &lt;
>           greater-than sign  &gt;
’           apostrophe         &apos;
"           quotation          &quot;

以上五个字符是唯一的 需要转义的字符 XML。所有其他字符都可以 直接在编辑器中输入 支持UTF-8。您还可以使用 数字字符引用 指定字符的 Unicode, 例如:

character   name                       xml
©           copyright sign             &#xA9;
℗           sound recording copyright  &#x2117;
™           trade mark sign            &#x2122;

有关进一步参考,请参阅 XML 角色和实体引用


现在我在 PHP5 下使用 htmlentities() 并且 feed 正在验证和工作。但根据我收集的信息,一些可以放入内容中的东西可能会成为使其不再有效的实体。确保我不会传递错误数据的最佳函数是什么?我很担心某些内容会被输入并被实体化并破坏提要 - 我是否应该使用 str_replace() 并替换为命名实体并保留其余部分?或者我可以以某种方式使用 htmlspecialchars() 吗?

简而言之,什么是 htmentities() 的直接替代品,可以确保播客 RSS 提要中的描述、标题等输入是安全的?

I am reading the documentation for creating a podcast feed suitable for iTunes, and the Common Mistakes section says:


Using HTML Named Character Entities.

<! — illegal xml — >
<copyright>© 2005 John Doe</copyright>

<! — valid xml — >
<copyright>© 2005 John Doe</copyright>

Unlike HTML, XML supports only five
"named character entities":

character   name               xml
&           ampersand          &
<           less-than sign     <
>           greater-than sign  >
’           apostrophe         '
"           quotation          "

The five characters above are the only
characters that require escaping in
XML. All other characters can be
entered directly in an editor that
supports UTF-8. You can also use
numeric character references that
specify the Unicode for the character,
for example:

character   name                       xml
©           copyright sign             ©
℗           sound recording copyright  ℗
™           trade mark sign            ™

For further reference see XML
Character and EntityReferences
.


Right now I'm using htmlentities() under PHP5 and the feed is validating and working. But from what I gather some things that could get put into content might become entities that would make it no longer be valid. What's the best function to use to assure I'm not passing along bad data? I'm paranoid something will get entered and get entity-ized and break the feed -- should I just use str_replace() and replace with named entities and leave the rest alone? Or can I use htmlspecialchars() somehow?

So in short, what's a drop-in replacement for htmentities() that will make sure input is safe for description, titles, etc in a podcast RSS feed?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

吲‖鸣 2024-09-14 13:18:25

您可以:

  • 使用 CDATA 块代替(只需确保使用正确的编码,即 XML 文件的编码与数据的编码相匹配)。您唯一需要注意的是 ]]>,它不能按字面意思放在 CDATA 块中。
  • 使用 mb_encode_numericentity 而不是 htmlentities(可能与 htmlspecialchars 以及之前使用 mb_convert_encoding 对 html 实体进行解码相结合)。

如果 XML 文件的编码是 UTF-8,则只需删除实体即可。假设您有以下 HTML 片段:

© 2005 John Doe

那么,您可以这样做:

$data = "© 2005 John Doe";
$data = mb_convert_encoding($data, "UTF-8", "HTML-ENTITIES");
$data = htmlspecialchars($data, ENT_NOQUOTES, "UTF-8");

You can either:

  • Use a CDATA block instead (just make sure you're using the correct encoding, i.e., the encoding of the XML file matches the encoding of the data). The only think you have to lookout for is ]]>, which cannot be put literally in a CDATA block.
  • Use mb_encode_numericentity instead of htmlentities (possibly combined with htmlspecialchars and a previous decoding of html entites with mb_convert_encoding).

If the encoding of the XML file is UTF-8, you can just remove the entities. Suppose you have the following HTML fragment:

© 2005 John Doe

Then, you could just do:

$data = "© 2005 John Doe";
$data = mb_convert_encoding($data, "UTF-8", "HTML-ENTITIES");
$data = htmlspecialchars($data, ENT_NOQUOTES, "UTF-8");
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文