通过xslt清理xml中的CDATA
我正在尝试将来自 Wordpress 的 RSS 2 转换为 XHTML 1.0 Strict(使用 cronjob 和 xsltproc);但是,Wordpress 会在 summary
元素末尾的 CDATA
中插入 img
。 img
有一个 border
属性,该属性在 XHTML 1.0 Strict 中无效。因为它是 CDATA,所以我认为这意味着我无法将它与我的 XSLT 匹配。我可以肯定地说,img
始终是 CDATA
结束之前的最后一件事。我更愿意删除 border
属性并保留图像,但我宁愿完全删除该元素,也不愿使用无效的标记。
是否可以使用 XSLT(或许使用字符串表达式)在 CDATA 内进行匹配?如果是这样,这是正确的方法吗?还是有更好的解决方案?
I am trying to transform RSS 2 coming from Wordpress into XHTML 1.0 Strict (using a cronjob and xsltproc); however, Wordpress inserts an img
into the CDATA
at the end of the summary
element. The img
has a border
attribute, which is invalid in XHTML 1.0 Strict. Because it's CDATA, I assume that means I can't match it with my XSLT. I can say for certain that the img
is always the last thing before the CDATA
ends. I'd prefer to strip the border
attr and keep the image, but I'd rather get rid of the element entirely than have invalid markup.
Is it possible to match inside CDATA using XSLT, perhaps using a string expression? If so, is that the right way to go here, or is there a better solution to be had?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
请记住 CDATA 的含义:“字符数据”。将某些内容放入 CDATA 中意味着:这可能看起来像标记,但我不希望您将其视为标记。因此,如果 CDATA 中的内容看起来像一个
img
元素,那么 CDATA 会告诉您不要被愚弄 - 它根本不是一个元素。话虽如此,您当然可以像处理任何其他字符串一样处理文本,包括将其传递到 XML 解析器以将其转换为节点树。Remember what CDATA means: "character data". Putting something in CDATA means: this might look like markup, but I don't want you to treat it as markup. So if that thing inside the CDATA looks like an
img
element, the CDATA is there to tell you not to be fooled - it's not an element at all. Having said that, you can of course process the text in the way you process any other character string, including passing it to an XML parser to be turned into a tree of nodes.CDATA只是一个文本节点,您可以将其与
text()
模板匹配。然后,您可以使用字符串函数从文本中删除border
属性。CDATA is merely a text node, you can match it with
text()
template. Then you can use string functions to removeborder
attr from the text.