正则表达式 - 获取字符之间的单词

发布于 2024-11-09 14:50:10 字数 878 浏览 0 评论 0原文

给出以下示例字符串：“[ One].[Two ].[ Three ].[Four]” 我想匹配“一”； “二”、“三”、“四”。

换句话说：我需要获取括号之间的单词，无论该单词周围有多少空格。

我尝试使用以下表达式：

(?<=\[)(?s)(.*?)(?=\s*\])

结果为 " One"、"Two"、" Three" 和 "四”。

编辑： 这比我第一次想象的要复杂一点：

有许多（至少一个）单词被括号封装，这些单词可能被任意字符分隔（例如 "[one]"或“[一][二][三]。[四]”）。
括号包含一个单词和许多甚至没有空格（例如 "[one]" 或 "[two ]" 或 "[ Three ]"< /code>。
这些单词块和括号被已知的字符序列包围： “这些词 [word-1] .. [word-n] 是众所周知的” 或 “这些词 [word-1] .. [word-n] 是众所周知的”。

请注意，“[word-1] .. [word-n]”仅代表上述块的任意计数。

我只想匹配括号之间的单个单词并消除环绕序列（“这些单词” 和 “众所周知”）以及可能存在的括号内和块之间的空格。此外，块之间可能存在的字符（不能超过一个）也应该被消除。希望这不会太奇怪；）

原文

Given the following example string: "[ One].[Two ].[ Three ].[Four]"
I want to match "One"; "Two", "Three" and "Four".

In other words: I need to get the word between the brackets, regardless how many white spaces are surround this word.

I've tried it with the following expression:

(?<=\[)(?s)(.*?)(?=\s*\])

That results in " One", "Two", " Three" and "Four".

EDIT:
It's a little bit more complicated than I first tought it would be:

There are many (at least one) word(s) encapsulated by brackets which might seperated by an arbitrary char (e.g. "[one]" or "[one] [two][three].[four]").
The brackets contain one single word and many, or even no whitespaces (e.g. "[one]" or "[two ]" or "[ three ]".
These blocks of words and there enclosing brackets are surrounded by a known sequence of chars:
"These words [word-1] .. [word-n] are well known" or
"These words [word-1] .. [word-n] are well known".

Please note that "[word-1] .. [word-n]" just stands for an arbitrary count of the blocks described above.

I want to match just the single word(s) between the brackets and eliminate the surround sequence ("These words" and "are well known") as well as possibly existing whitespaces within the brackets and between the blocks. In addition, the possibly existing char (it couldn't be more than only one) between the blocks should be eliminiated, too.
Hope that wasn't too weird ;)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟燃烟灭 2024-11-16 14:50:10

您可以使用它，启用“全局”标志

\[\s*(\S+?)\s*\]

解释

\[      # a literal "["
\s*     # any number of white space
(\S+?)  # at least one non white-space character, non-greedily (group 1)
\s*     # any number of white space
\]      # a literal "]"

编辑：

@ Kobi 指出 \S+? 实际上可以匹配 "[ One]" 等目标中的 ]。因此，第 1 组暂时将包含 "One]"。

但是正则表达式的末尾仍然有 \]，此时正则表达式引擎将回溯并将 "]" 提供给 \]< /code>，这样表达式就可以成功。

此处使用贪婪匹配非常重要（\S+?，而不是 \S+）。我在答案的第一个版本中也犯了这个错误。

此外，\S 非常不具体。如果您对“某个词”的含义有更具体的了解，请务必使用它。

You can use this, with the "global" flag enabled

\[\s*(\S+?)\s*\]

Explanation

\[      # a literal "["
\s*     # any number of white space
(\S+?)  # at least one non white-space character, non-greedily (group 1)
\s*     # any number of white space
\]      # a literal "]"

EDIT:

@Kobi noted that \S+? can actually match the ] in targets like "[ One]". So for a moment, group 1 would contain "One]".

But then there still is the \] at the end of the regex, at which point the regex engine would backtrack and give the "]" to \], so the expression can succeed.

It is vitally important to use on-greedy matching here (\S+?, as opposed to \S+). I got that wrong in the first version of my answer as well.

Further, the \S is very unspecific. If you have anything more specific in terms of what "a word" is for you - by all means, use it.

回复收藏 0 原文

柏拉图鍀咏恒 2024-11-16 14:50:10

非贪婪匹配是关键。尝试以下操作：

\[\s*(.+?)\s*\]

它将匹配括号内的任何内容，并捕获它，前后不带空格。如果括号内的字符串不能有空格，我建议使用以下表达式，因为它是一个更好的表达式。

\[\s*(\S+)\s*\]

Non-greedy matching is the key. Try the following:

\[\s*(.+?)\s*\]

It will match anything within brackets and capture it without the whitespace before or after. If the string within the brackets cannot have spaces, I recommend the following as it's a better expression.

\[\s*(\S+)\s*\]

回复收藏 0 原文

梦回梦里 2024-11-16 14:50:10

一个简单的解决方案是使用捕获组来获取您真正想要的匹配部分：

\[\s*(.*?)\s*\]

示例：

MatchCollection matches = Regex.Matches(s, @"\[\s*(.*?)\s*\]");
string[] words = matches.Cast<Match>().Select(m => m.Groups[1].Value).ToArray();

类似的选项是使用修剪：

MatchCollection matches = Regex.Matches(s, @"\[([^\]]*)\]");
string[] words = matches.Cast<Match>().Select(m => m.Groups[1].Value.Trim()).ToArray();

如果您确实想要，可以使用环视：

(?<=\[\s*)\S.*?(?=\s*\])

示例：

MatchCollection matches = Regex.Matches(s, @"(?<=\[\s*)\S.*?(?=\s*\])");
string[] words = matches.Cast<Match>().Select(m => m.Value).ToArray();

A simple solution is to use capturing groups to get the part of the match you really want:

\[\s*(.*?)\s*\]

Example:

MatchCollection matches = Regex.Matches(s, @"\[\s*(.*?)\s*\]");
string[] words = matches.Cast<Match>().Select(m => m.Groups[1].Value).ToArray();

A similar option is to use trim:

MatchCollection matches = Regex.Matches(s, @"\[([^\]]*)\]");
string[] words = matches.Cast<Match>().Select(m => m.Groups[1].Value.Trim()).ToArray();

If you really want, you can use look-arounds:

(?<=\[\s*)\S.*?(?=\s*\])

Example:

MatchCollection matches = Regex.Matches(s, @"(?<=\[\s*)\S.*?(?=\s*\])");
string[] words = matches.Cast<Match>().Select(m => m.Value).ToArray();

回复收藏 0 原文

静谧 2024-11-16 14:50:10

正则表达式是绝对必要的吗？如果没有，我相信你可以通过修剪来去掉括号、点和空格。

char[] chars = new char[] {'[', ']', '.', ' '};
inputString = inputString.Trim(chars);

Is regex absolutely necessary? If not, I believe you could just Trim to get rid of the brackets, dots, and spaces.

char[] chars = new char[] {'[', ']', '.', ' '};
inputString = inputString.Trim(chars);

回复收藏 0 原文

~没有更多了~

关于作者

長街聽風

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

正则表达式 - 获取字符之间的单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

正则表达式 - 获取字符之间的单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。