方案中的正则表达式和转义字符

发布于 2024-11-08 23:42:19 字数 355 浏览 7 评论 0原文

在方案中，

有 "hello hellu-#\"hella.helloo,hallo#\return#\"" 字符串

我想将它们列为 ("hello" "hello" "hella" "helloo " "hallo")

用空格、连字符、双引号、点、逗号、return 分隔

我尝试过

(regexp-split #rx"( +)|(#\-)|(#\")|(#\.)|(,)|(#\return)" string)

，但 #\- , #\. 出错，

有任何提示或解决方案吗？

谢谢

原文

in scheme,

there is "hello hellu-#\"hella.helloo,hallo#\return#\"" string

I want to list them as ("hello" "hellu" "hella" "helloo" "hallo")

separate by space, hyphen, double quote, dot, comma, return

I tried

(regexp-split #rx"( +)|(#\-)|(#\")|(#\.)|(,)|(#\return)" string)

but #\- , #\. make error

any hint or solution?

thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱给你人给你 2024-11-15 23:42:19

看起来您将字符语法 (#\foo) 与字符串语法混淆了，并且您在字符串和正则表达式中都这样做了。所以我的猜测是，您要拆分的字符串实际上是：

"hello hellu-\"hella.helloo,hallo\n\""

其中 \" 代表双引号字符，\n 代表换行符。如果这是这种情况，那么（再次，这是猜测你的意图）正则表达式应该是：

(regexp-split #rx"( +)|(\-)|(\")|(\.)|(,)|(\n)" string)

但这也不起作用，因为 \- 和 \. 是无效的转义符（Racket使用类似 C 的转义符），因此将其更改为：

(regexp-split #rx"( +)|(-)|(\")|(.)|(,)|(\n)" string)

这也不起作用，因为 . 在正则表达式中具有通常的“任何字符”含义 - 因此您想用反斜杠对其进行转义，与许多其他字符串语法一样，您会得到一个反斜杠。通过用反斜杠转义它，所以现在我们有了一个最终接近工作版本的版本：

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"( +)|(-)|(\")|(\\.)|(,)|(\n)" string)
'("hello" "hellu" "" "hella" "helloo" "hallo" "" "")

首先，可以大大改进正则表达式：拆分不需要括号：

(regexp-split #rx" +|-|\"|\\.|,|\n" string)

然后，而不是使用一堆单字符和|s，您可以只使用“字符范围”：

(regexp-split #rx" +|[-\".,\n]" string)

请注意，重要的是 - 是该范围中的第一个（或最后一个）字符，因此它不会具有一系列字符的通常含义。接下来，您似乎确实希望将此类字符的任何序列用作分隔符，这将避免结果中的一些空字符串：

(regexp-split #rx" +|[-\".,\n]+" string)

在这种情况下，您也可以将空格放入范围中（小心将其放在 - 之后，正如我上面所解释的）。我们现在得到：

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"[- \".,\n]+" string)
'("hello" "hellu" "hella" "helloo" "hallo" "")

最后，您可能想要删除最后一个空字符串。从技术上讲，它应该在那里，因为在字符串末尾之前有一系列匹配的字符。 Racket 中解决这个问题的一个简单方法是使用互补的 regexp-match* ，它返回匹配列表，而不是拆分匹配列表：

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-match* #rx"[- \".,\n]+" string)
'(" " "-\"" "." "," "\n\"")

这显然是错误的，因为它给你的是分隔符而不是它们之间的内容。但由于这个正则表达式是一个字符范围，所以很容易解决——只需否定字符范围，你就会得到你想要的：

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-match* #rx"[^- \".,\n]+" string)
'("hello" "hellu" "hella" "helloo" "hallo")

It looks like you're confusing the syntax for characters (#\foo) with the syntax for strings, and you do that in both the string and the regexp. So my guess is that the string that you want to split is actually:

"hello hellu-\"hella.helloo,hallo\n\""

where \" stands for a double quote character, and \n for a newline. If this is the case, then (again, this is guessing your intention) the regexp should be:

(regexp-split #rx"( +)|(\-)|(\")|(\.)|(,)|(\n)" string)

But that doesn't work either, since \- and \. are invalid escapes (Racket uses C-like escapes), so change that to:

(regexp-split #rx"( +)|(-)|(\")|(.)|(,)|(\n)" string)

This doesn't work either, since . has the usual "any char" meaning in a regexp -- so you want to escape it with a backslash. As with many other string syntaxes, you get a backslash by escaping it with a backslash, so now we have a version that is finally close to a working one:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"( +)|(-)|(\")|(\\.)|(,)|(\n)" string)
'("hello" "hellu" "" "hella" "helloo" "hallo" "" "")

First, the regexp can be improved considerably: the parens are not needed for splitting:

(regexp-split #rx" +|-|\"|\\.|,|\n" string)

Then, instead of using a bunch of single-characters with |s, you can just use a "character range":

(regexp-split #rx" +|[-\".,\n]" string)

Note that it's important that the - is the first (or last) character in the range, so it will not have the usual meaning of a range of characters. Next, it seems that you really want any sequence of such characters to be used as a separator, which will avoid some of those empty strings in the result:

(regexp-split #rx" +|[-\".,\n]+" string)

and in this case you can just as well throw the space into the range too (carefully putting it after the -, as I explained above). We now get:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"[- \".,\n]+" string)
'("hello" "hellu" "hella" "helloo" "hallo" "")

And finally you'd probably want to get rid of that last empty string. Technically, it should be there, since there is a sequence of matching characters before the end of the string. An easy way in Racket around this is to use the complementary regexp-match* which returns the list of matches rather than splitting on the list of matches:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-match* #rx"[- \".,\n]+" string)
'(" " "-\"" "." "," "\n\"")

This is obviously broken, since it gives you the separators rather than what's between them. But since this regexp is a range of characters, it is easy to resolve -- simply negate the character range, and you get what you want:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-match* #rx"[^- \".,\n]+" string)
'("hello" "hellu" "hella" "helloo" "hallo")

回复收藏 0 原文

~没有更多了~

关于作者

记忆消瘦

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

方案中的正则表达式和转义字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

qq_VRzBBA45

痴情

。

Mu.

凉薄对峙

不落城

友情链接

方案中的正则表达式和转义字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

qq_VRzBBA45

痴情

。

Mu.

凉薄对峙

不落城

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。