XSLT 2.0 处理混合文本和 cdata 的无效节点
我需要将以下节点解析
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
为有效字符串,最好是“keyword1,keyword2,keyword3”,但我会完全删除 cdata。
尝试访问该节点时会显示文本“keyword1,keyword2keyword3”,但我无法判断 CDATA 从哪里开始。
原始 xml(mRSS feed 的简化版本)
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
<channel>
<item>
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
</item>
</channel>
</rss>
xsl(简化):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:media="http://search.yahoo.com/mrss/" exclude-result-prefixes="xs xsi fn">
<xsl:output method="xml" encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
<test>
<xsl:variable name="items" select="/rss/channel/item"/>
<xsl:for-each select="$items">
<xsl:variable name="mediakw" select="media:keywords"/>
<xsl:element name="mediaKeyWords">
<xsl:value-of select="$mediakw"/>
</xsl:element>
</xsl:for-each>
</test>
</xsl:template>
</xsl:stylesheet>
和输出:
<test xmlns:media="http://search.yahoo.com/mrss/"><mediaKeyWords>keyword1,keyword2keyword3</mediaKeyWords></test>
非常感谢!
I need to parse the following node:
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
into a valid string, preferably "keyword1,keyword2,keyword3" but I would settle for removing the cdata completely.
Trying to access the node gives me the text "keyword1,keyword2keyword3" and I can't tell where the CDATA begins.
original xml (simplified version of mRSS feed)
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
<channel>
<item>
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
</item>
</channel>
</rss>
xsl (simplified):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:media="http://search.yahoo.com/mrss/" exclude-result-prefixes="xs xsi fn">
<xsl:output method="xml" encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
<test>
<xsl:variable name="items" select="/rss/channel/item"/>
<xsl:for-each select="$items">
<xsl:variable name="mediakw" select="media:keywords"/>
<xsl:element name="mediaKeyWords">
<xsl:value-of select="$mediakw"/>
</xsl:element>
</xsl:for-each>
</test>
</xsl:template>
</xsl:stylesheet>
and the output:
<test xmlns:media="http://search.yahoo.com/mrss/"><mediaKeyWords>keyword1,keyword2keyword3</mediaKeyWords></test>
Thanks a lot!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
XML 和 XSLT 在这方面无法为您提供帮助。
XSLT 使用 INFOSET 模型,其中没有任何内容作为“CDATA 节点”,只有一个 text() 节点:
“keyword1,keyword2keyword3”
XML 文档需要更正并在子字符串
"keyword2"
和"keyword3"
之间插入逗号一个解决方案是处理 CDATA DOM节点使用 DOM,然后才启动 XSLT 转换。
XML and XSLT cannot help you here.
XSLT uses the INFOSET model in which there isn't anything as a "CDATA node" and there is just a single text() node:
"keyword1,keyword2keyword3"
The XML document needs to be corrected and a comma be inserted between the substrings
"keyword2"
and"keyword3"
One solution would be to process the CDATA DOM node using DOM, and only then initiate the XSLT transformation.
当 XSLT 处理器看到文本时,CDATA 已经消失。您无法看到传入的 CDATA,并且无法控制如何生成输出 CDATA(对于给定标签,全部或全部不生成)。
By the time the XSLT processor sees the text, the CDATA is gone. You cannot see the incoming CDATA, and have very little control over how output CDATA is generated (all or nothing for a given tag).
无法在标准 XSLT 中完成。
您收到的输入 XML
(对于 XSLT)是无法区分的,
因为 CDATA 标记只是转义其中数据的一种方式。在这种情况下,实际上没有特殊的标记可以转义,因此 CDATA 恰好是一个无操作。但是 XSLT 无法知道哪些数据最初是使用 CDATA 表达的,哪些数据是使用字符实体表达的,等等。
解决方案是告诉提供此 XML 的任何人,他们需要在 keywords2 和 keywords3 之间放置分隔符。
Can't be done in standard XSLT.
The input XML you're receiving,
is indistinguishable (to XSLT) from
because the CDATA markup is just a way of escaping the data inside it. There is really no special markup to escape in this case, so the CDATA happens to be a no-op. But XSLT has no way of knowing what data was originally expressed using CDATA, what was expressed using character entities, etc.
The solution would be to tell whoever is providing this XML that they need to put a delimiter between keyword2 and keyword3.