方案中的正则表达式和转义字符

发布于 2024-11-08 23:42:19 字数 355 浏览 7 评论 0原文

在方案中,

"hello hellu-#\"hella.helloo,hallo#\return#\"" 字符串

我想将它们列为 ("hello" "hello" "hella" "helloo " "hallo")

用空格、连字符、双引号、点、逗号、return 分隔

我尝试过

(regexp-split #rx"( +)|(#\-)|(#\")|(#\.)|(,)|(#\return)" string)

,但 #\- , #\. 出错,

有任何提示或解决方案吗?

谢谢

in scheme,

there is "hello hellu-#\"hella.helloo,hallo#\return#\"" string

I want to list them as ("hello" "hellu" "hella" "helloo" "hallo")

separate by space, hyphen, double quote, dot, comma, return

I tried

(regexp-split #rx"( +)|(#\-)|(#\")|(#\.)|(,)|(#\return)" string)

but #\- , #\. make error

any hint or solution?

thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

爱给你人给你 2024-11-15 23:42:19

看起来您将字符语法 (#\foo) 与字符串语法混淆了,并且您在字符串和正则表达式中都这样做了。所以我的猜测是,您要拆分的字符串实际上是:

"hello hellu-\"hella.helloo,hallo\n\""

其中 \" 代表双引号字符,\n 代表换行符。如果这是这种情况,那么(再次,这是猜测你的意图)正则表达式应该是:

(regexp-split #rx"( +)|(\-)|(\")|(\.)|(,)|(\n)" string)

但这也不起作用,因为 \-\. 是无效的转义符(Racket使用类似 C 的转义符),因此将其更改为:

(regexp-split #rx"( +)|(-)|(\")|(.)|(,)|(\n)" string)

这也不起作用,因为 . 在正则表达式中具有通常的“任何字符”含义 - 因此您想用反斜杠对其进行转义,与许多其他字符串语法一样,您会得到一个反斜杠。通过用反斜杠转义它,所以现在我们有了一个最终接近工作版本的版本:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"( +)|(-)|(\")|(\\.)|(,)|(\n)" string)
'("hello" "hellu" "" "hella" "helloo" "hallo" "" "")

首先,可以大大改进正则表达式:拆分不需要括号:

(regexp-split #rx" +|-|\"|\\.|,|\n" string)

然后,而不是使用一堆单字符和|s,您可以只使用“字符范围”:

(regexp-split #rx" +|[-\".,\n]" string)

请注意,重要的是 - 是该范围中的第一个(或最后一个)字符,因此它不会具有一系列字符的通常含义。接下来,您似乎确实希望将此类字符的任何序列用作分隔符,这将避免结果中的一些空字符串:

(regexp-split #rx" +|[-\".,\n]+" string)

在这种情况下,您也可以将空格放入范围中(小心将其放在 - 之后,正如我上面所解释的)。我们现在得到:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"[- \".,\n]+" string)
'("hello" "hellu" "hella" "helloo" "hallo" "")

最后,您可能想要删除最后一个空字符串。从技术上讲,它应该在那里,因为在字符串末尾之前有一系列匹配的字符。 Racket 中解决这个问题的一个简单方法是使用互补的 regexp-match* ,它返回匹配列表,而不是拆分匹配列表:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-match* #rx"[- \".,\n]+" string)
'(" " "-\"" "." "," "\n\"")

这显然是错误的,因为它给你的是分隔符而不是它们之间的内容。但由于这个正则表达式是一个字符范围,所以很容易解决——只需否定字符范围,你就会得到你想要的:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-match* #rx"[^- \".,\n]+" string)
'("hello" "hellu" "hella" "helloo" "hallo")

It looks like you're confusing the syntax for characters (#\foo) with the syntax for strings, and you do that in both the string and the regexp. So my guess is that the string that you want to split is actually:

"hello hellu-\"hella.helloo,hallo\n\""

where \" stands for a double quote character, and \n for a newline. If this is the case, then (again, this is guessing your intention) the regexp should be:

(regexp-split #rx"( +)|(\-)|(\")|(\.)|(,)|(\n)" string)

But that doesn't work either, since \- and \. are invalid escapes (Racket uses C-like escapes), so change that to:

(regexp-split #rx"( +)|(-)|(\")|(.)|(,)|(\n)" string)

This doesn't work either, since . has the usual "any char" meaning in a regexp -- so you want to escape it with a backslash. As with many other string syntaxes, you get a backslash by escaping it with a backslash, so now we have a version that is finally close to a working one:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"( +)|(-)|(\")|(\\.)|(,)|(\n)" string)
'("hello" "hellu" "" "hella" "helloo" "hallo" "" "")

First, the regexp can be improved considerably: the parens are not needed for splitting:

(regexp-split #rx" +|-|\"|\\.|,|\n" string)

Then, instead of using a bunch of single-characters with |s, you can just use a "character range":

(regexp-split #rx" +|[-\".,\n]" string)

Note that it's important that the - is the first (or last) character in the range, so it will not have the usual meaning of a range of characters. Next, it seems that you really want any sequence of such characters to be used as a separator, which will avoid some of those empty strings in the result:

(regexp-split #rx" +|[-\".,\n]+" string)

and in this case you can just as well throw the space into the range too (carefully putting it after the -, as I explained above). We now get:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-split #rx"[- \".,\n]+" string)
'("hello" "hellu" "hella" "helloo" "hallo" "")

And finally you'd probably want to get rid of that last empty string. Technically, it should be there, since there is a sequence of matching characters before the end of the string. An easy way in Racket around this is to use the complementary regexp-match* which returns the list of matches rather than splitting on the list of matches:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-match* #rx"[- \".,\n]+" string)
'(" " "-\"" "." "," "\n\"")

This is obviously broken, since it gives you the separators rather than what's between them. But since this regexp is a range of characters, it is easy to resolve -- simply negate the character range, and you get what you want:

> (define string "hello hellu-\"hella.helloo,hallo\n\"")
> (regexp-match* #rx"[^- \".,\n]+" string)
'("hello" "hellu" "hella" "helloo" "hallo")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文