如何使用 XSLT 中的 RegEx 检查 xml textnode 是否包含中文字符

发布于 2024-11-19 01:57:07 字数 1094 浏览 1 评论 0原文

在此网站上 http://gskinner.com/RegExr/ （这是一个正则表达式测试网站）此正则表达式匹配作品匹配： [^\x00-\xff]
示例文本： test123 或元件数据不可用

但是如果我有这个输入 XML：

<?xml version="1.0" encoding="UTF-8" ?>
<root>
  <node>test123 或元件数据不可用</node>
</root>

并且我尝试使用 Saxon 9 的 XSLT 2.0 样式表：

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/root/node">
    <xsl:if test="matches(., '[^\x00-\xff]')">
      <xsl:text>Text has chinese characters!</xsl:text>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

Saxon 9 给出以下错误输出：

    FORX0002: Error at character 3 in regular expression "[^\x00-\xff]": invalid escape sequence
  Failed to compile stylesheet. 1 error detected.

如何检查里面的中文字符XSLT 2.0？

原文

On this website http://gskinner.com/RegExr/ (which is a RegEx test website) this regex match works
Match:
[^\x00-\xff]
Sample Text: test123 或元件数据不可用

But if I have this input XML:

<?xml version="1.0" encoding="UTF-8" ?>
<root>
  <node>test123 或元件数据不可用</node>
</root>

and I try this XSLT 2.0 stylesheet with Saxon 9:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/root/node">
    <xsl:if test="matches(., '[^\x00-\xff]')">
      <xsl:text>Text has chinese characters!</xsl:text>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

Saxon 9 gives me following error output:

    FORX0002: Error at character 3 in regular expression "[^\x00-\xff]": invalid escape sequence
  Failed to compile stylesheet. 1 error detected.

How to check for chinese characters inside XSLT 2.0 ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

儭儭莪哋寶赑 2024-11-26 01:57:07

XPath 支持的正则表达式方言基于 XSD 中定义的方言：您可以在 W3C 文档中找到完整的规范，或者如果您喜欢更具可读性的内容，请在我的 XSLT 2.0 程序员参考中找到。不要假设所有正则表达式方言都是相同的。 XPath 正则表达式中没有 \x 转义，因为它是为嵌入 XML 而设计的，而 XML 已经提供了 &#xHHHH;。

您可能会发现使用命名的 Unicode 块比使用十六进制范围更方便，例如 \p{IsCJKUnifiedIdeographs}。

另请参阅 Unicode 中汉字的完整范围是多少？

回复收藏 0 原文

自由范儿 2024-11-26 01:57:07

在迈克尔·凯的帮助下，我可以自己回答我的问题。谢谢迈克尔！
该解决方案有效，但在我看来，这么长的 Unicode 范围看起来不太漂亮。

如果在给定 XML 中使用正则表达式找到任何中文字符，则此 XSLT 将打印一条文本消息：

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/root/node">
    <xsl:if test="matches(.,'[一-鿿㐀-䷿𠀀-𪛟豈-﫿丽-𯨟]')">
      <xsl:text>Text has chinese characters!</xsl:text>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

使用命名 Unicode 块的解决方案：

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/root/node">
    <xsl:if test="matches(., '[\p{IsCJKUnifiedIdeographs}\p{IsCJKUnifiedIdeographsExtensionA}\p{IsCJKUnifiedIdeographsExtensionB}\p{IsCJKCompatibilityIdeographs}\p{IsCJKCompatibilityIdeographsSupplement}]')">
      <xsl:text>Text has chinese characters!</xsl:text>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

With the help from Michael Kay I can answer my question myself. Thanks Michael!
The solution works but in my opinion this long Unicode ranges do not look very pretty.

This XSLT will print a text message if any Chinese character were found with regular expressions in the given XML:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/root/node">
    <xsl:if test="matches(.,'[一-鿿㐀-䷿𠀀-𪛟豈-﫿丽-𯨟]')">
      <xsl:text>Text has chinese characters!</xsl:text>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

Solution with named Unicode block:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/root/node">
    <xsl:if test="matches(., '[\p{IsCJKUnifiedIdeographs}\p{IsCJKUnifiedIdeographsExtensionA}\p{IsCJKUnifiedIdeographsExtensionB}\p{IsCJKCompatibilityIdeographs}\p{IsCJKCompatibilityIdeographsSupplement}]')">
      <xsl:text>Text has chinese characters!</xsl:text>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

回复收藏 0 原文

~没有更多了~