方案中的正则表达式和转义字符
在方案中,
有 "hello hellu-#\"hella.helloo,hallo#\return#\""
字符串
我想将它们列为 ("hello" "hello" "hella" "helloo " "hallo")
用空格、连字符、双引号、点、逗号、return 分隔
我尝试过
(regexp-split #rx"( +)|(#\-)|(#\")|(#\.)|(,)|(#\return)" string)
,但 #\- , #\.
出错,
有任何提示或解决方案吗?
谢谢
in scheme,
there is "hello hellu-#\"hella.helloo,hallo#\return#\""
string
I want to list them as ("hello" "hellu" "hella" "helloo" "hallo")
separate by space, hyphen, double quote, dot, comma, return
I tried
(regexp-split #rx"( +)|(#\-)|(#\")|(#\.)|(,)|(#\return)" string)
but #\- , #\.
make error
any hint or solution?
thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来您将字符语法 (
#\foo
) 与字符串语法混淆了,并且您在字符串和正则表达式中都这样做了。所以我的猜测是,您要拆分的字符串实际上是:其中
\"
代表双引号字符,\n
代表换行符。如果这是这种情况,那么(再次,这是猜测你的意图)正则表达式应该是:但这也不起作用,因为
\-
和\.
是无效的转义符(Racket使用类似 C 的转义符),因此将其更改为:这也不起作用,因为
.
在正则表达式中具有通常的“任何字符”含义 - 因此您想用反斜杠对其进行转义,与许多其他字符串语法一样,您会得到一个反斜杠。通过用反斜杠转义它,所以现在我们有了一个最终接近工作版本的版本:首先,可以大大改进正则表达式:拆分不需要括号:
然后,而不是使用一堆单字符和
|
s,您可以只使用“字符范围”:请注意,重要的是
-
是该范围中的第一个(或最后一个)字符,因此它不会具有一系列字符的通常含义。接下来,您似乎确实希望将此类字符的任何序列用作分隔符,这将避免结果中的一些空字符串:在这种情况下,您也可以将空格放入范围中(小心将其放在
-
之后,正如我上面所解释的)。我们现在得到:最后,您可能想要删除最后一个空字符串。从技术上讲,它应该在那里,因为在字符串末尾之前有一系列匹配的字符。 Racket 中解决这个问题的一个简单方法是使用互补的 regexp-match* ,它返回匹配列表,而不是拆分匹配列表:
这显然是错误的,因为它给你的是分隔符而不是它们之间的内容。但由于这个正则表达式是一个字符范围,所以很容易解决——只需否定字符范围,你就会得到你想要的:
It looks like you're confusing the syntax for characters (
#\foo
) with the syntax for strings, and you do that in both the string and the regexp. So my guess is that the string that you want to split is actually:where
\"
stands for a double quote character, and\n
for a newline. If this is the case, then (again, this is guessing your intention) the regexp should be:But that doesn't work either, since
\-
and\.
are invalid escapes (Racket uses C-like escapes), so change that to:This doesn't work either, since
.
has the usual "any char" meaning in a regexp -- so you want to escape it with a backslash. As with many other string syntaxes, you get a backslash by escaping it with a backslash, so now we have a version that is finally close to a working one:First, the regexp can be improved considerably: the parens are not needed for splitting:
Then, instead of using a bunch of single-characters with
|
s, you can just use a "character range":Note that it's important that the
-
is the first (or last) character in the range, so it will not have the usual meaning of a range of characters. Next, it seems that you really want any sequence of such characters to be used as a separator, which will avoid some of those empty strings in the result:and in this case you can just as well throw the space into the range too (carefully putting it after the
-
, as I explained above). We now get:And finally you'd probably want to get rid of that last empty string. Technically, it should be there, since there is a sequence of matching characters before the end of the string. An easy way in Racket around this is to use the complementary
regexp-match*
which returns the list of matches rather than splitting on the list of matches:This is obviously broken, since it gives you the separators rather than what's between them. But since this regexp is a range of characters, it is easy to resolve -- simply negate the character range, and you get what you want: