如何通过正则表达式分割这个字符串?

发布于 2024-10-21 19:33:19 字数 596 浏览 1 评论 0原文

我有一些字符串,它们看起来像:

div#title.title.top
#main.main
a.bold#empty.red

它们与 haml 类似,我想通过正则表达式分割它们,但我不知道如何定义它。

val r = """???""".r // HELP
val items = "a.bold#empty.red".split(r)
items // -> "a", ".bold", "#empty", ".red"

如何做到这一点?


更新

抱歉,大家,但我需要让这个问题变得更难。我很感兴趣

val r = """(?<=\w)\b"""

但它无法解析更复杂的:

div#question-title.title-1.h-222_333

我希望它将被解析为:

div
#question-title
.title-1
.h-222_333 

我想知道如何改进该正则表达式?

I have some string, they looks like:

div#title.title.top
#main.main
a.bold#empty.red

They are similar to haml, and I want to split them by regex, but I don't know how to define it.

val r = """???""".r // HELP
val items = "a.bold#empty.red".split(r)
items // -> "a", ".bold", "#empty", ".red"

How to do this?


UPDATE

Sorry, everyone, but I need to make this question harder. I'm very interested in

val r = """(?<=\w)\b"""

But it failed to parse the more complex ones:

div#question-title.title-1.h-222_333

I hope it will be parsed to:

div
#question-title
.title-1
.h-222_333 

I wanna know how to improve that regex?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

莫多说 2024-10-28 19:33:19
val r = """(?<=\w)\b(?!-)"""

请注意,split 采用表示正则表达式的 String,而不是 Regex,因此您不能从 String 转换 r > 到正则表达式

正则表达式的简要说明:

  • (?<=...) 是后向查找。它指出此匹配项之前必须带有模式 ...,或者在您的情况下为 \w,这意味着您希望该模式跟随数字、字母或下划线。

  • \b 表示字边界。它是单词字符(数字、字母和下划线)和非单词字符之间发生的零长度匹配,反之亦然。因为它是零长度,所以 split 在分割时不会删除任何字符。

  • (?!...) 是否定前瞻。在这里我常说我对从字母到破折号的单词边界不感兴趣。

val r = """(?<=\w)\b(?!-)"""

Note that split takes a String representing a regular expression, not a Regex, so you must not convert r from String to Regex.

Brief explanation on the regex:

  • (?<=...) is a look-behind. It states that this match must be preceded by the pattern ..., or, in your case \w, meaning you want the pattern to follow a digit, letter, or underline.

  • \b means word boundary. It is a zero-length match that happen between a word character (digits, letters and underscore) and a non-word character, or vice versa. Because it is zero-length, split won't remove any character when splitting.

  • (?!...) is a negative-lookahead. Here I use to say that I'm not interested in word boundaries from a letter to a dash.

无语# 2024-10-28 19:33:19

从 Josh M 的回答开始,他有一个很好的正则表达式,但是由于 split 采用与“分隔符”匹配的正则表达式,因此您需要使用 findAllIn ,如下所示:

val r = """(?:\.|#)?\w+""".r
val items = r findAllIn "a.bold#empty.red"
    //maybe you want a toList on the end also

然后您得到结果

div#title.title.top    -> List(div, #title, .title, .top)
#main.main             -> List(#main, .main)
a.bold#empty.red       -> List(a, .bold, #empty, .red)

Starting with Josh M's answer, he has a good regular expression, but since split takes a regular expression matching the "delimiter", you need to use findAllIn as follows:

val r = """(?:\.|#)?\w+""".r
val items = r findAllIn "a.bold#empty.red"
    //maybe you want a toList on the end also

Then you get the results

div#title.title.top    -> List(div, #title, .title, .top)
#main.main             -> List(#main, .main)
a.bold#empty.red       -> List(a, .bold, #empty, .red)
独行侠 2024-10-28 19:33:19

我不完全确定您在这里需要什么,但这应该有所帮助:

(?:\.|#)?\w+

这意味着“术语”被定义为可选的点或散列,后跟一些单词字符。

你最终会得到:

div
#title
.title
.top
#main
.main
a
.bold
#empty
.red

I'm not completely sure what you need here but this should help:

(?:\.|#)?\w+

It means a "term" is defined as an optional dot or hash followed by some word characters.

You will end up with:

div
#title
.title
.top
#main
.main
a
.bold
#empty
.red
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文