奇怪的单词通过在clojure中使用regexp替换

发布于 2025-01-23 10:11:38 字数 704 浏览 1 评论 0原文

我想在中替换所有demo单词“演示演示演示演示demo1演示”通过使用以下代码,但结果似乎有些怪异。

(string/replace
 "demo demo demo demo demo1 Demo"
 (re-pattern (str "(?i)" "(^|\\s)(" "demo" ")($|\\s)"))
 "$1[[$2]]$3")

;; => "[[demo]] demo [[demo]] demo demo1 [[Demo]]"

为什么未更换第二和第四个?感谢任何解释和解决方案。

编辑:做了一些实验。如果在第二个单词之前添加了另一个空间,则可以成功替换它,因此看起来“边界”一词无法使用两次。我可以两次进行替换来替换所有这些演示单词,但这很麻烦。有更好的解决方案吗?

(string/replace
 "demo  demo demo demo demo1 Demo"
 (re-pattern (str "(?i)" "(^|\\s)(" "demo" ")($|\\s)"))
 "$1[[$2]]$3")

;; => "[[demo]]  [[demo]] demo [[demo]] demo1 [[Demo]]"

I would like to replace all demo words in "demo demo demo demo demo1 Demo" by using the following codes, but the result seems a little bit weird.

(string/replace
 "demo demo demo demo demo1 Demo"
 (re-pattern (str "(?i)" "(^|\\s)(" "demo" ")($|\\s)"))
 "$1[[$2]]$3")

;; => "[[demo]] demo [[demo]] demo demo1 [[Demo]]"

Why are the second and fourth ones not been replaced? Appreciated any explanation and solutions.

edit: did some experiments. If another space is added before the second word, then it can be successfully replaced, so it looks like the word boundary cannot be used twice. I can do the replacement twice to replace all those demo words, but it is cumbersome. Is there any better solutions?

(string/replace
 "demo  demo demo demo demo1 Demo"
 (re-pattern (str "(?i)" "(^|\\s)(" "demo" ")($|\\s)"))
 "$1[[$2]]$3")

;; => "[[demo]]  [[demo]] demo [[demo]] demo1 [[Demo]]"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

凉城 2025-01-30 10:11:38

我将使用\ b )这样:

(clojure.string/replace "demo demo demo demo demo1 Demo"
                        #"\b(demo)\b"
                        "[[$1]]")

=> "[[demo]] [[demo]] [[demo]] [[demo]] demo1 Demo"

如果您也想匹配demo

(clojure.string/replace "demo demo demo demo demo1 Demo"
                        #"\b([d|D]emo)\b"
                        "[[$1]]")

=> "[[demo]] [[demo]] [[demo]] [[demo]] demo1 [[Demo]]"

I would use \b (a word boundary) like this:

(clojure.string/replace "demo demo demo demo demo1 Demo"
                        #"\b(demo)\b"
                        "[[$1]]")

=> "[[demo]] [[demo]] [[demo]] [[demo]] demo1 Demo"

If you also want to match Demo:

(clojure.string/replace "demo demo demo demo demo1 Demo"
                        #"\b([d|D]emo)\b"
                        "[[$1]]")

=> "[[demo]] [[demo]] [[demo]] [[demo]] demo1 [[Demo]]"
聽兲甴掵 2025-01-30 10:11:38

Martinpůda的答案,使用\ b零宽的单词边界匹配模式,可能是最适合您的需求的。

如果您想了解为什么您的答案不做想要的事情,那么关键是您期望的比赛重叠。默认情况下,Java Matcher类及其Clojure等效假设假定非重叠的匹配。在您的特殊情况下,您的模式是“线或空间的开始,然后是字符串'演示',然后是线路或线路的末端”。因此,第一匹匹配是在第一个“演示”之后使用的空间,因此第二个“演示”与模式不符。这也是为什么当您添加两个空间时,您的图案有效。

处理正则模式重叠匹配的一般方法是在模式中使用零宽的lookaheads和lookbehinds。这是一个示例,可以解决您的特定问题,而您的答案只需更改。

user> (clojure.string/replace
       "demo demo demo demo demo1 Demo"
       (re-pattern (str "(?i)" "(\\s?)(?<=^|\\s)(" "demo" ")(?=\\s|$)(\\s?)"))
       "$1[[$2]]$3")
"[[demo]] [[demo]] [[demo]] [[demo]] demo1 [[Demo]]"

为了更好地显示每场比赛,我们可以将每场匹配包装在卷曲括号中

user> (clojure.string/replace
       "demo demo demo demo demo1 Demo"
       (re-pattern (str "(?i)" "(\\s?)(?<=^|\\s)(" "demo" ")(?=\\s|$)(\\s?)"))
       "{$1[[$2]]$3}")
"{[[demo]] }{[[demo]] }{[[demo]] }{[[demo]] }demo1{ [[Demo]]}"

Martin Půda's answer, using the \b zero-width word-boundary matching pattern, is probably the best for your needs.

If you want to understand why your answer is not doing what you want, the crux is that you are expecting overlapping matches. By default, the Java Matcher class and its Clojure equivalent assumes non-overlapping matches. In your particular case, your pattern is "Start of line or space, followed by the string 'demo', followed by a space or end of line". Therefore the first match is using up the space after the first 'demo' and thus the second 'demo' does not match the pattern. That is also why when you added two spaces, your pattern worked.

The general way in which to handle overlapping matches in a regex pattern is to use zero-width lookaheads and lookbehinds in the pattern. Here is an example that solves your particular problem with just small changes to your answer.

user> (clojure.string/replace
       "demo demo demo demo demo1 Demo"
       (re-pattern (str "(?i)" "(\\s?)(?<=^|\\s)(" "demo" ")(?=\\s|$)(\\s?)"))
       "$1[[$2]]$3")
"[[demo]] [[demo]] [[demo]] [[demo]] demo1 [[Demo]]"

To better show each of the matches, we can enclose each match in curly braces as follows

user> (clojure.string/replace
       "demo demo demo demo demo1 Demo"
       (re-pattern (str "(?i)" "(\\s?)(?<=^|\\s)(" "demo" ")(?=\\s|$)(\\s?)"))
       "{$1[[$2]]$3}")
"{[[demo]] }{[[demo]] }{[[demo]] }{[[demo]] }demo1{ [[Demo]]}"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文