正则表达式帮助:我的正则表达式模式将匹配无效字符串

发布于 2024-08-16 17:31:37 字数 682 浏览 5 评论 0原文

我想要验证的文本字符串由我所说的“段”组成。单个段可能如下所示:

 [A-Z,S,3]

到目前为止,我设法构建了这个正则表达式模式,

(?:\[(?<segment>[^,\]\[}' ]+?,[S|D],\d{1})\])+?

它可以工作,但即使整个文本字符串包含无效文本,它也会返回匹配项。我想我需要在模式中的某个地方使用 ^$ 但我不知道如何!?

我希望我的模式产生以下结果:

  • [AZ,S,3][A-Za-z0-9åäöÅäÖ,D,4] OK(两段)
  • [AZ,S,3]aaaa[A-Za-z0-9åäöÅäÖ,D,4] 不匹配
  • 废话[AZ,S,3][A -Za-z0-9åäöÅäÖ,D,4] 不匹配
  • [AZ,S,3][] 不匹配
  • < code>[AZ,S,3][klm,D,4][0-9,S,1] 确定(三段)

The text string I want to validate consists of what I call "segments". A single segment might look like this:

 [A-Z,S,3]

So far I managed to build this regex pattern

(?:\[(?<segment>[^,\]\[}' ]+?,[S|D],\d{1})\])+?

it works but it will return matches even though the whole text string contains invalid text. I guess I need to use ^ and $ somewhere in my pattern but I can't figure out how!?

I would like my pattern to produce the following results:

  • [A-Z,S,3][A-Za-z0-9åäöÅÄÖ,D,4] OK(two segments)
  • [A-Z,S,3]aaaa[A-Za-z0-9åäöÅÄÖ,D,4] No match
  • crap[A-Z,S,3][A-Za-z0-9åäöÅÄÖ,D,4] No match
  • [A-Z,S,3][] No match
  • [A-Z,S,3][klm,D,4][0-9,S,1] OK(three segments)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不知在何时 2024-08-23 17:31:37

使用 ^ 锚定开始,使用 $ 锚定结束。例如:^(abc)*$,这匹配该组的零次或多次重复(本例中为“abc”),并且必须从输入字符串的开头开始并在输入字符串的末尾结束它。

^(?:[(?[^,][}' ]+?,[S|D],\d{1})])+$ - 使用非贪婪的 +? 并不重要,因为无论如何你都需要它匹配到最后。但是,您的正则表达式有一些问题。

^(?:\[[^,]+,[SD],\d\])+$ - 看起来更像你想要的。

  • 我无法破译您第一部分的含义,因此我的正则表达式比所需的更通用, [^,]+, 将匹配任何非逗号后跟逗号的序列,并且在事实上,您可能应该将 ] 添加到这个否定字符类中。
  • [S|D]三个 个字符的字符类,因为 | 在这里并不意味着交替((S|D )[SD] 的含义相同)。
  • {1} 是任何原子的默认值,您无需指定它。

伪代码(在 codepad.org):

import re
def find_segments(input_string):
  results = []
  regex = re.compile(r"\[([^],]+),([SD]),(\d)\]")
  start = 0
  while True:
    m = regex.match(input_string, start)
    if not m: # no match
      return None # whole string didn't match, do another action as appropriate
    results.append(m.group(1, 2, 3))
    start = m.end(0)
    if start == len(input_string):
      break
  return results

print find_segments("[A-Z,S,3][klm,D,4][0-9,S,1]")
# output:
#[('A-Z', 'S', '3'), ('klm', 'D', '4'), ('0-9', 'S', '1')]

这里最大的区别是表达式只匹配完整的 [...] 部分,但它是连续应用的,所以它们必须从最后一个结束处(或在字符串末尾结束)重新开始。

Use ^ to anchor the start and $ to anchor the end. E.g.: ^(abc)*$, this matches zero or more repetitions of the group ("abc" in this example) and that must start at the start of the input string and end at the end of it.

^(?:[(?[^,][}' ]+?,[S|D],\d{1})])+$—using an ungreedy +? doesn't matter, as you require it to match until the end anyway. However, your regex has a few issues.

^(?:\[[^,]+,[SD],\d\])+$—seems more like what you want.

  • I couldn't decipher what you meant by the first part, so my regex is more general than required, [^,]+, will match any sequence of non-commas followed by a comma, and in fact you should probably add ] to this negated character class.
  • [S|D] is a character class of three characters, as | doesn't mean alternation here ((S|D) would mean the same as [SD] though).
  • {1} is the default for any atom, you don't need to specify it.

Pseudocode (run it at codepad.org):

import re
def find_segments(input_string):
  results = []
  regex = re.compile(r"\[([^],]+),([SD]),(\d)\]")
  start = 0
  while True:
    m = regex.match(input_string, start)
    if not m: # no match
      return None # whole string didn't match, do another action as appropriate
    results.append(m.group(1, 2, 3))
    start = m.end(0)
    if start == len(input_string):
      break
  return results

print find_segments("[A-Z,S,3][klm,D,4][0-9,S,1]")
# output:
#[('A-Z', 'S', '3'), ('klm', 'D', '4'), ('0-9', 'S', '1')]

The big difference here is the expression matches only the complete [...] part, but it is applied in succession, so they must start again where the last ends (or end at the end of the string).

ㄟ。诗瑗 2024-08-23 17:31:37

您想要这样的内容:

/^(\[[^],]+,[SD],\d\])+$/

以下是如何在 C# 中使用此正则表达式的示例:

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main(string[] args)
    {
        string[] tests = {
            "[A-Z,S,3][A-Za-z0-9,D,4]",
            "[A-Z,S,3]aaaa[A-Za-z0-9,D,4]",
            "crap[A-Z,S,3][A-Za-z0-9,D,4]",
            "[A-Z,S,3][]",
            "[A-Z,S,3][klm,D,4][0-9,S,1]"
        };

        string segmentRegex = @"\[([^],]+,[SD],\d)\]";
        string lineRegex = "^(" + segmentRegex + ")+$";

        foreach (string test in tests)
        {
            bool isMatch = Regex.Match(test, lineRegex).Success;
            if (isMatch)
            {
                Console.WriteLine("Successful match: " + test);
                foreach (Match match in Regex.Matches(test, segmentRegex))
                {
                    Console.WriteLine(match.Groups[1]);
                }
            }
        }
    }
}

输出:

Successful match: [A-Z,S,3][A-Za-z0-9,D,4]
A-Z,S,3
A-Za-z0-9,D,4
Successful match: [A-Z,S,3][klm,D,4][0-9,S,1]
A-Z,S,3
klm,D,4
0-9,S,1

You want something like this:

/^(\[[^],]+,[SD],\d\])+$/

Here is an example of how you could use this regular expression in C#:

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main(string[] args)
    {
        string[] tests = {
            "[A-Z,S,3][A-Za-z0-9,D,4]",
            "[A-Z,S,3]aaaa[A-Za-z0-9,D,4]",
            "crap[A-Z,S,3][A-Za-z0-9,D,4]",
            "[A-Z,S,3][]",
            "[A-Z,S,3][klm,D,4][0-9,S,1]"
        };

        string segmentRegex = @"\[([^],]+,[SD],\d)\]";
        string lineRegex = "^(" + segmentRegex + ")+$";

        foreach (string test in tests)
        {
            bool isMatch = Regex.Match(test, lineRegex).Success;
            if (isMatch)
            {
                Console.WriteLine("Successful match: " + test);
                foreach (Match match in Regex.Matches(test, segmentRegex))
                {
                    Console.WriteLine(match.Groups[1]);
                }
            }
        }
    }
}

Output:

Successful match: [A-Z,S,3][A-Za-z0-9,D,4]
A-Z,S,3
A-Za-z0-9,D,4
Successful match: [A-Z,S,3][klm,D,4][0-9,S,1]
A-Z,S,3
klm,D,4
0-9,S,1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文