XSL - 删除不间断空格

发布于 2024-10-31 20:24:31 字数 463 浏览 0 评论 0 原文

在我的 XSL 实现 (2.0) 中，我尝试使用以下语句来删除所有空格和空格。文本节点内的不间断空格。它仅适用于空格，但不适用于 ASCII 代码为   的不间断空格。     &#X202F;                         等。我使用 SAXON 处理器来执行。

当前的 XSL 代码：

translate(normalize-space($text-nodes[1]),  ' ' , '' ))

如何删除它们。请分享您的想法。

原文

In my XSL implementation (2.0), I tried using the below statement to remove all the spaces & non breaking spaces within a text node. It works for spaces only but not for non breaking spaces whose ASCII codes are,   etc. I am using SAXON processor for execution.

Current XSL code:

translate(normalize-space($text-nodes[1]),  ' ' , '' ))

How can I have them removed. Please share your thoughts.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浮世清欢 2024-11-07 20:24:31

这些代码是 Unicode，而不是 ASCII（大部分），因此您可能应该使用替换函数替换为正则表达式包含 Unicode 分隔符字符类：

replace($text-nodes[1], '\p{Z}+', '')

更详细地说：

正则表达式\p{Z}+ 匹配 Unicode 中“分隔符”类别中的一个或多个字符。 \p{} 是类别转义序列，与大括号内指定的类别中的单个字符匹配。 Z 指定“分隔符”类别（包括各种空白）。 + 表示“匹配前面的正则表达式一次或多次”。 replace 函数返回其第一个参数的版本，其中与第二个参数匹配的所有非重叠子字符串都替换为第三个参数。因此，这将返回 $text-nodes[1] 的版本，其中所有分隔符字符序列均替换为空字符串，即删除。

Those codes are Unicode, not ASCII (for the most part), so you should probably use the replace function with a regex containing the Unicode separator character class:

replace($text-nodes[1], '\p{Z}+', '')

In more detail:

The regex \p{Z}+ matches one or more characters that are in the "separator" category in Unicode. \p{} is the category escape sequence, which matches a single character in the category specified within the curly braces. Z specifies the "separator" category (which includes various kinds of whitespace). + means "match the preceding regex one or more times". The replace function returns a version of its first argument with all non-overlapping substrings matching its second argument replaced with its third argument. So this returns a version of $text-nodes[1] with all sequences of separator characters replaced with the empty string, i.e. removed.