使用正则表达式 - 重复模式

发布于 2024-12-12 11:32:23 字数 325 浏览 6 评论 0原文

我正在尝试使用正则表达式来匹配某些文本。

以下模式是我试图收集的。

@Identifier('VariableA', 'VariableB', 'VariableX', ..., 'VariableZ')

我想获取动态数量的变量,而不是固定的两个或三个变量。 有什么办法可以做到这一点吗?我有一个现有的正则表达式:

\@(\w+)\W+(\w+)\W+(\w+)\W+(\w+)

它捕获标识符和最多三个变量。

编辑:只有我这么认为,还是正则表达式没有我想象的那么强大?

I am trying to use regular expressions to match some text.

The following pattern is what I am trying to gather.

@Identifier('VariableA', 'VariableB', 'VariableX', ..., 'VariableZ')

I would like to grab a dynamic number of variables rather than a fixed set of two or three.
Is there any way to do this? I have an existing Regular Expression:

\@(\w+)\W+(\w+)\W+(\w+)\W+(\w+)

This captures the Identifier and up to three variables.

Edit: Is it just me, or are regular expressions not as powerful as I'm making them out to be?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

遗弃M 2024-12-19 11:32:23

您想使用 scan 来完成这类事情。基本模式是这样的:

s.scan(/\w+/)

这将为您提供单词字符的所有连续序列的数组:

>> "@Identifier('VariableA', 'VariableB', 'VariableX', 'VariableZ')".scan(/\w+/)
=> ["Identifier", "VariableA", "VariableB", "VariableX", "VariableZ"]

您说您可能有模式的多个实例,周围有任意内容。您可以使用嵌套的扫描来处理这个问题:

s.scan(/@(\w+)\(([^)]+?)\)/).map { |m| [ m.first, m.last.scan(/\w+/) ] }

这将为您提供一个数组数组,每个内部数组将“标识符”部分作为第一个元素,“变量”部分作为数组第二个元素。例如:

>> s = "pancakes @Identifier('VariableA', 'VariableB', 'VariableX', 'VariableZ') pancakes @Pancakes('one','two','three') eggs"
>> s.scan(/@(\w+)\(([^)]+?)\)/).map { |m| [ m.first, m.last.scan(/\w+/) ] }
=> [["Identifier", ["VariableA", "VariableB", "VariableX", "VariableZ"]], ["Pancakes", ["one", "two", "three"]]]

如果您可能在“变量”位中遇到转义引号,那么您将需要更复杂的东西。


关于表达式的一些注释:

@            # A literal "@".
(            # Open a group
  \w+        # One more more ("+") word characters ("\w").
)            # Close the group.
\(           # A literal "(", parentheses are used for group so we escape it.
(            # Open a group.
  [          # Open a character class.
    ^)       # The "^" at the beginning of a [] means "not", the ")" isn't escaped because it doesn't have any special meaning inside a character class.
  ]          # Close a character class.
  +?         # One more of the preceding pattern but don't be greedy.
)            # Close the group.
\)           # A literal ")".

这里您实际上不需要 [^)]+? ,只需 [^)]+ 即可,但我使用非贪婪形式习惯,因为这通常就是我的意思。分组用于分隔@Identifier和Variable部分,以便我们可以轻松获得所需的嵌套数组输出。

You want to use scan for this sort of thing. The basic pattern would be this:

s.scan(/\w+/)

That would give you an array of all the contiguous sequences for word characters:

>> "@Identifier('VariableA', 'VariableB', 'VariableX', 'VariableZ')".scan(/\w+/)
=> ["Identifier", "VariableA", "VariableB", "VariableX", "VariableZ"]

You say you might have multiple instances of your pattern with arbitrary stuff surrounding them. You can deal with that with nested scans:

s.scan(/@(\w+)\(([^)]+?)\)/).map { |m| [ m.first, m.last.scan(/\w+/) ] }

That will give you an array of arrays, each inner array will have the "Identifier" part as the first element and that "Variable" parts as an array in the second element. For example:

>> s = "pancakes @Identifier('VariableA', 'VariableB', 'VariableX', 'VariableZ') pancakes @Pancakes('one','two','three') eggs"
>> s.scan(/@(\w+)\(([^)]+?)\)/).map { |m| [ m.first, m.last.scan(/\w+/) ] }
=> [["Identifier", ["VariableA", "VariableB", "VariableX", "VariableZ"]], ["Pancakes", ["one", "two", "three"]]]

If you might be facing escaped quotes inside your "Variable" bits then you'll need something more complex.


Some notes on the expression:

@            # A literal "@".
(            # Open a group
  \w+        # One more more ("+") word characters ("\w").
)            # Close the group.
\(           # A literal "(", parentheses are used for group so we escape it.
(            # Open a group.
  [          # Open a character class.
    ^)       # The "^" at the beginning of a [] means "not", the ")" isn't escaped because it doesn't have any special meaning inside a character class.
  ]          # Close a character class.
  +?         # One more of the preceding pattern but don't be greedy.
)            # Close the group.
\)           # A literal ")".

You don't really need [^)]+? here, just [^)]+ would do but I use the non-greedy forms by habit because that's usually what I mean. The grouping is used to separate the @Identifier and Variable parts so that we can easily get the desired nested array output.

耳根太软 2024-12-19 11:32:23

但亚历克斯认为你的意思是你想捕捉同样的东西四次。如果您想捕获相同的模式,但不同的事物,那么您可能需要考虑两件事:

迭代。在 perl 中,您可以说

while ($variable =~ /regex/g) {

“g”代表“global”,并且意味着每次调用正则表达式时,它都会匹配 /next/ 实例。

另一种选择是递归。像这样编写你的正则表达式:

/(what you want)(.*)/

然后,你有包含第一件事的反向引用 1,你可以将其推送到数组,以及反向引用 2,然后你将对其进行递归,直到它不再匹配。

But alex thinks that you meant you wanted to capture the same thing four times. If you want to capture the same pattern, but different things, then you may want to consider two things:

Iteration. In perl, you can say

while ($variable =~ /regex/g) {

the 'g' stands for 'global', and means that each time the regex is called, it matches the /next/ instance.

The other option is recursion. Write your regex like this:

/(what you want)(.*)/

Then, you have backreference 1 containing the first thing, which you can push to an array, and backreference 2 which you'll then recurse over until it no longer matches.

一影成城 2024-12-19 11:32:23

您可以简单地使用(\w+)

给定输入字符串
@Identifier('VariableA', 'VariableB', 'VariableX', 'VariableZ')

结果将是:

  1. Identifier
  2. VariableA
  3. VariableB
  4. VariableX
  5. VariableZ

这适用于任意数量的变量。

为了供将来参考,在 Rubular 上尝试正则表达式的想法既简单又有趣。

You may use simply (\w+).

Given the input string
@Identifier('VariableA', 'VariableB', 'VariableX', 'VariableZ')

The results would be:

  1. Identifier
  2. VariableA
  3. VariableB
  4. VariableX
  5. VariableZ

This would work for an arbitrary number of variables.

For future reference, it's easy and fun to play around with regexp ideas on Rubular.

朱染 2024-12-19 11:32:23

所以你问是否有一种方法可以捕获标识符和任意数量的变量。恐怕您只能使用支持捕获的正则表达式引擎来执行此操作。请注意,捕获捕获组不是一回事。你想记住所有的“变量”。这无法通过简单的捕获组来完成。

我不知道 Ruby 是否支持此功能,但我确信 .NET 和新的 PERL 6 支持它。

在你的情况下,你可以使用两个正则表达式。一种用于捕获标识符,例如 ^\s*@(\w+)

,另一种用于捕获所有变量,例如 result = subject.scan(/'[^']+'/)< /代码>

So you are asking if there is a way to capture both the identifier and an arbitrary number of variables. I am afraid that you can only do this with regex engines that support captures. Note here that captures and capturing groups are not the one and the same thing. You want to remember all the "variables". This can't be done with simple capturing groups.

I am unaware whether Ruby supports this or not, but I am sure that .NET and the new PERL 6 support it.

In your case you could use two regexes. One to capture the identifier e.g. ^\s*@(\w+)

and another one to capture all variables e.g. result = subject.scan(/'[^']+'/)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文