用于匹配自定义语法的正则表达式

发布于 2024-08-03 12:07:41 字数 2909 浏览 4 评论 0原文

我正在尝试编写一个正则表达式来匹配和拆分 C# 中的自定义变量语法。这里的想法是字符串值的自定义格式,与 .NET String.Format/{0} 样式的字符串格式非常相似。

例如,用户可以定义要在运行时评估的字符串格式,如下所示:

D:\Path\{LanguageId}\{PersonId}\ 

值“LanguageId”与数据对象字段匹配,并替换其当前值。

当需要将参数传递给格式化字段时,事情会变得棘手。例如:

{LanguageId:English|Spanish|French}

如果“LanguageId”的值等于参数之一,则这意味着执行某些条件逻辑。

最后,我需要支持这样的映射参数:

{LanguageId:English=>D:\path\english.xml|Spanish=>D:\path\spansih.xml}

这是所有可能值的枚举:

命令无参数: 做一些特殊的事情

{@Date}

命令单参数:

{@Date:yyyy-mm-dd}

无参数:

{LanguageId}

单参数列表:

{LanguageId:English}

多参数列表:

{LanguageId:English|Spanish}

单参数映射:

{LanguageId:English=>D:\path\english.xml}

多参数映射:

{LanguageId:English=>D:\path\english.xml|Spanish=>D:\path\spansih.xml}

摘要:语法可以归结为具有可选参数类型列表或映射(不是两者)的键。

下面是我到目前为止的正则表达式,它有一些问题,即它不能正确处理所有空白,在 .NET 中我没有得到我期望的分割。例如,在第一个示例中,我返回了“{LanguageId}{PersonId}”的单个匹配项,而不是两个不同的匹配项。我也确信它不处理文件系统路径或分隔的、带引号的字符串。任何帮助我克服困难的帮助将不胜感激。或者有什么建议。

    private const string RegexMatch = @"
        \{                              # opening curly brace
        [\s]*                           # whitespace before command
        @?                              # command indicator
        (.[^\}\|])+                       # string characters represening command or metadata
        (                               # begin grouping of params
        :                               # required param separater 
        (                               # begin select list param type

        (                               # begin group of list param type
        .+[^\}\|]                       # string of characters for the list item
        (\|.+[^\}\|])*                  # optional multiple list items with separator
        )                               # end select list param type

        |                               # or select map param type

        (                               # begin group of map param type
        .+[^\}\|]=>.+[^\}\|]            # string of characters for map key=>value pair
        (\|.+[^\}\|]=>.+[^\}\|])*       # optional multiple param map items
        )                               # end group map param type

        )                               # end select map param type
        )                               # end grouping of params
        ?                               # allow at most 1 param group
        \s*
        \}                              # closing curly brace
        ";

I am trying to write a regular expression to match and split a custom variable syntax in C#. The idea here is a custom formatting of string values very similar to the .NET String.Format/{0} style of string formatting.

For example the user would define a String format to be evaluated at runtime like so:

D:\Path\{LanguageId}\{PersonId}\ 

The value 'LanguageId' matches an data object field, and its current value replaces.

Things get tricky when there is a need to pass arguments to the formatting field. For example:

{LanguageId:English|Spanish|French}

This would have the meaning of executing some conditional logic if the value of 'LanguageId' was equal to one of the arguments.

Lastly I would need to support map arguments like this:

{LanguageId:English=>D:\path\english.xml|Spanish=>D:\path\spansih.xml}

Here is an enumeration of all possible values:

Command no argument:
do something special

{@Date}

Command single argument:

{@Date:yyyy-mm-dd}

No argument:

{LanguageId}

Single argument-list:

{LanguageId:English}

Multi Argument-list:

{LanguageId:English|Spanish}

Single Argument-map:

{LanguageId:English=>D:\path\english.xml}

Multi Argument-map:

{LanguageId:English=>D:\path\english.xml|Spanish=>D:\path\spansih.xml}

Summary: The syntax can be boiled down to a Key with optional parameter type list or map (not both).

Below is the Regex I have so far which has a few problems, namely it doesnt handle all whitespace correctly, in .NET I dont get the splits I am expecting. For instance in the first example i am returned a single match of '{LanguageId}{PersonId}' instead of two distinct matches. Also i am sure it doesnt handle filesystem path, or delimited, quoted strings. Any help getting me over the hump would be appreciated. Or any recommendations.

    private const string RegexMatch = @"
        \{                              # opening curly brace
        [\s]*                           # whitespace before command
        @?                              # command indicator
        (.[^\}\|])+                       # string characters represening command or metadata
        (                               # begin grouping of params
        :                               # required param separater 
        (                               # begin select list param type

        (                               # begin group of list param type
        .+[^\}\|]                       # string of characters for the list item
        (\|.+[^\}\|])*                  # optional multiple list items with separator
        )                               # end select list param type

        |                               # or select map param type

        (                               # begin group of map param type
        .+[^\}\|]=>.+[^\}\|]            # string of characters for map key=>value pair
        (\|.+[^\}\|]=>.+[^\}\|])*       # optional multiple param map items
        )                               # end group map param type

        )                               # end select map param type
        )                               # end grouping of params
        ?                               # allow at most 1 param group
        \s*
        \}                              # closing curly brace
        ";

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

漫雪独思 2024-08-10 12:07:41

您试图用一个正则表达式做太多事情。我建议您将任务分解为多个步骤,第一个是对看起来像变量的东西进行简单匹配。该正则表达式可以很简单:

\{\s*([^{}]+?)\s*\}

将整个变量/命令字符串保存在组 #1 中,减去大括号和周围的空格。之后,您可以根据需要拆分冒号、管道、"=>" 序列。不要将所有复杂性压缩到一个巨大的正则表达式中;如果您设法编写了正则表达式,那么当您的需求稍后发生变化时,您会发现它无法维护。

还有一件事:现在,您的重点是在输入正确时让代码正常工作,但是当用户输入错误时怎么办?您不想给他们有用的反馈吗?正则表达式对此很糟糕;他们严格地通过/失败。正则表达式非常有用,但与任何其他工具一样,您必须先了解它们的局限性,然后才能利用它们的全部功能。

You're trying to do too much with one regex. I suggest you break the task down into steps, the first being a simple match on something that looks like a variable. That regex could be as simple as:

\{\s*([^{}]+?)\s*\}

That saves your whole variable/command string in group #1, minus the braces and surrounding whitespace. After that you can split on colons, then pipes, then "=>" sequences as appropriate. Don't compress all the complexity into one monster regex; if you ever manage to get the regex written, you'll find it impossible to maintain when your requirements change later on.

And another thing: right now, you're focused on getting the code to work when the input is correct, but what about when the users get it wrong? Wouldn't you like to give them helpful feedback? Regexes suck at that; they're strictly pass/fail. Regexes can be amazingly useful, but like any other tool, you have to learn their limitations before you can harness their full power.

流年里的时光 2024-08-10 12:07:41

您可能想考虑将其实现为有限状态机而不是正则表达式,主要是为了速度目的。 http://en.wikipedia.org/wiki/Finite-state_machine

编辑:实际上,准确地说,您想要查看确定性有限状态机: http://en.wikipedia .org/wiki/Deterministic_finite-state_machine

You may want to take a look into implementing this as a Finate-State Machine instead of a regex, mainly for speed puropses. http://en.wikipedia.org/wiki/Finite-state_machine

Edit: Actually, to be precise, you want to look at Deterministic Finite State machines: http://en.wikipedia.org/wiki/Deterministic_finite-state_machine

哭了丶谁疼 2024-08-10 12:07:41

这个确实应该好好分析一下。

例如,我想使用 Regexp::Grammars 来解析它

请原谅长度。

#! /opt/perl/bin/perl
use strict;
use warnings;
use 5.10.1;

use Regexp::Grammars;

my $grammar = qr{
  ^<Path>$

  <objtoken: My::Path>
    <drive=([a-zA-Z])>:\\ <[elements=PathElement]> ** (\\) \\?

  <rule: PathElement>
    (?:
      <MATCH=BlockPathElement>
    |
      <MATCH=SimplePathElement>
    )

  <token: SimplePathElement>
    (?<= \\ ) <MATCH=([^\\]+)>

  <rule: My::BlockPathElement>
    (?<=\\){ \s*
    (?|
      <MATCH=Command>
    |
      <MATCH=Variable>
    )
    \s* }

  <objrule: My::Variable>
    <name=(\w++)> <options=VariableOptionList>?

  <rule: VariableOptionList>
      :
      <[MATCH=VariableOptionItem]> ** ([|])

  <token: VariableOptionItem>
    (?:
      <MATCH=VariableOptionMap>
    |
      <MATCH=( [^{}|]+? )>
    )

  <objrule: My::VariableOptionMap>
    \s*
    <name=(\w++)> => <value=([^{}|]+?)>
    \s*

  <objrule: My::Command>
    @ <name=(\w++)>
    (?:
      : <[arg=CommandArg]> ** ([|])
    )?

  <token: CommandArg>
    <MATCH=([^{}|]+?)> \s*

}x;

测试使用:

use YAML;
while( my $line = <> ){
  chomp $line;
  local %/;

  if( $line =~ $grammar ){
    say Dump \%/;
  }else{
    die "Error: $line\n";
  }
}

使用示例数据:

D:\Path\{LanguageId}\{PersonId}
E:\{ LanguageId : English | Spanish | French }
F:\Some Thing\{ LanguageId : English => D:\path\english.xml | Spanish => D:\path\spanish.xml }
C:\{@command}
c:\{@command :arg}
c:\{ @command : arg1 | arg2 }

结果:

---
'': 'D:\Path\{LanguageId}\{PersonId}'
Path: !!perl/hash:My::Path
  '': 'D:\Path\{LanguageId}\{PersonId}'
  drive: D
  elements:
    - Path
    - !!perl/hash:My::Variable
      '': LanguageId
      name: LanguageId
    - !!perl/hash:My::Variable
      '': PersonId
      name: PersonId

---
'': 'E:\{ LanguageId : English | Spanish | French }'
Path: !!perl/hash:My::Path
  '': 'E:\{ LanguageId : English | Spanish | French }'
  drive: E
  elements:
    - !!perl/hash:My::Variable
      '': 'LanguageId : English | Spanish | French'
      name: LanguageId
      options:
        - English
        - Spanish
        - French

---
'': 'F:\Some Thing\{ LanguageId : English => D:\path\english.xml | Spanish => D:\path\spanish.xml }'
Path: !!perl/hash:My::Path
  '': 'F:\Some Thing\{ LanguageId : English => D:\path\english.xml | Spanish => D:\path\spanish.xml }'
  drive: F
  elements:
    - Some Thing
    - !!perl/hash:My::Variable
      '': 'LanguageId : English => D:\path\english.xml | Spanish => D:\path\spanish.xml '
      name: LanguageId
      options:
        - !!perl/hash:My::VariableOptionMap
          '': 'English => D:\path\english.xml '
          name: English
          value: D:\path\english.xml
        - !!perl/hash:My::VariableOptionMap
          '': 'Spanish => D:\path\spanish.xml '
          name: Spanish
          value: D:\path\spanish.xml

---
'': 'C:\{@command}'
Path: !!perl/hash:My::Path
  '': 'C:\{@command}'
  drive: C
  elements:
    - !!perl/hash:My::Command
      '': '@command'
      name: command

---
'': 'c:\{@command :arg}'
Path: !!perl/hash:My::Path
  '': 'c:\{@command :arg}'
  drive: c
  elements:
    - !!perl/hash:My::Command
      '': '@command :arg'
      arg:
        - arg
      name: command

---
'': 'c:\{ @command : arg1 | arg2 }'
Path: !!perl/hash:My::Path
  '': 'c:\{ @command : arg1 | arg2 }'
  drive: c
  elements:
    - !!perl/hash:My::Command
      '': '@command : arg1 | arg2 '
      arg:
        - arg1
        - arg2
      name: command

示例程序:

my %ARGS = qw'
  LanguageId  English
  PersonId    someone
';

while( my $line = <> ){
  chomp $line;
  local %/;

  if( $line =~ $grammar ){
    say $/{Path}->fill( %ARGS );
  }else{
    say 'Error: ', $line;
  }
}

{
  package My::Path;

  sub fill{
    my($self,%args) = @_;

    my $out = $self->{drive}.':';

    for my $element ( @{ $self->{elements} } ){
      if( ref $element ){
        $out .= '\\' . $element->fill(%args);
      }else{
        $out .= "\\$element";
      }
    }

    return $out;
  }
}
{
  package My::Variable;

  sub fill{
    my($self,%args) = @_;

    my $name = $self->{name};

    if( exists $args{$name} ){
      $self->_fill( $args{$name} );
    }else{
      my $lc_name = lc $name;

      my @possible = grep {
        lc $_ eq $lc_name
      } keys %args;

      die qq'Cannot find argument for variable "$name"\n' unless @possible;
      if( @possible > 1 ){
        my $die = qq'Cannot determine which argument matches "$name" closer:\n';
        for my $possible( @possible ){
          $die .= qq'  "$possible"\n';
        }
        die $die;
      }

      $self->_fill( $args{$possible[1]} );
    }
  }
  sub _fill{
    my($self,$opt) = @_;

    # This is just an example.
    unless( exists $self->{options} ){
      return $opt;
    }

    for my $element ( @{$self->{options}} ){
      if( ref $element ){
        return '['.$element->value.']' if lc $element->name eq lc $opt;
      }elsif( lc $element eq lc $opt ){
        return $opt;
      }
    }

    my $name = $self->{name};
    my $die = qq'Invalid argument "$opt" for "$name" :\n';
    for my $valid ( @{$self->{options}} ){
      $die .= qq'  "$valid"\n';
    }
    die $die;
  }
}
{
  package My::VariableOptionMap;

  sub name{
    my($self) = @_;

    return $self->{name};
  }
}
{
  package My::Command;

  sub fill{
    my($self,%args) = @_;

    return '['.$self->{''}.']';
  }
}
{
  package My::VariableOptionMap;

  sub name{
    my($self) = @_;
    return $self->{name};
  }

  sub value{
    my($self) = @_;
    return $self->{value};
  }
}

使用示例数据输出:

D:\Path\English\someone
E:\English
F:\Some Thing\[D:\path\english.xml]
C:\[@command]
c:\[@command :arg]
c:\[@command : arg1 | arg2 ]

This should really be parsed.

For an example, I wanted to parse this using Regexp::Grammars.

Please excuse the length.

#! /opt/perl/bin/perl
use strict;
use warnings;
use 5.10.1;

use Regexp::Grammars;

my $grammar = qr{
  ^<Path>$

  <objtoken: My::Path>
    <drive=([a-zA-Z])>:\\ <[elements=PathElement]> ** (\\) \\?

  <rule: PathElement>
    (?:
      <MATCH=BlockPathElement>
    |
      <MATCH=SimplePathElement>
    )

  <token: SimplePathElement>
    (?<= \\ ) <MATCH=([^\\]+)>

  <rule: My::BlockPathElement>
    (?<=\\){ \s*
    (?|
      <MATCH=Command>
    |
      <MATCH=Variable>
    )
    \s* }

  <objrule: My::Variable>
    <name=(\w++)> <options=VariableOptionList>?

  <rule: VariableOptionList>
      :
      <[MATCH=VariableOptionItem]> ** ([|])

  <token: VariableOptionItem>
    (?:
      <MATCH=VariableOptionMap>
    |
      <MATCH=( [^{}|]+? )>
    )

  <objrule: My::VariableOptionMap>
    \s*
    <name=(\w++)> => <value=([^{}|]+?)>
    \s*

  <objrule: My::Command>
    @ <name=(\w++)>
    (?:
      : <[arg=CommandArg]> ** ([|])
    )?

  <token: CommandArg>
    <MATCH=([^{}|]+?)> \s*

}x;

Testing with:

use YAML;
while( my $line = <> ){
  chomp $line;
  local %/;

  if( $line =~ $grammar ){
    say Dump \%/;
  }else{
    die "Error: $line\n";
  }
}

With sample data:

D:\Path\{LanguageId}\{PersonId}
E:\{ LanguageId : English | Spanish | French }
F:\Some Thing\{ LanguageId : English => D:\path\english.xml | Spanish => D:\path\spanish.xml }
C:\{@command}
c:\{@command :arg}
c:\{ @command : arg1 | arg2 }

Results in:

---
'': 'D:\Path\{LanguageId}\{PersonId}'
Path: !!perl/hash:My::Path
  '': 'D:\Path\{LanguageId}\{PersonId}'
  drive: D
  elements:
    - Path
    - !!perl/hash:My::Variable
      '': LanguageId
      name: LanguageId
    - !!perl/hash:My::Variable
      '': PersonId
      name: PersonId

---
'': 'E:\{ LanguageId : English | Spanish | French }'
Path: !!perl/hash:My::Path
  '': 'E:\{ LanguageId : English | Spanish | French }'
  drive: E
  elements:
    - !!perl/hash:My::Variable
      '': 'LanguageId : English | Spanish | French'
      name: LanguageId
      options:
        - English
        - Spanish
        - French

---
'': 'F:\Some Thing\{ LanguageId : English => D:\path\english.xml | Spanish => D:\path\spanish.xml }'
Path: !!perl/hash:My::Path
  '': 'F:\Some Thing\{ LanguageId : English => D:\path\english.xml | Spanish => D:\path\spanish.xml }'
  drive: F
  elements:
    - Some Thing
    - !!perl/hash:My::Variable
      '': 'LanguageId : English => D:\path\english.xml | Spanish => D:\path\spanish.xml '
      name: LanguageId
      options:
        - !!perl/hash:My::VariableOptionMap
          '': 'English => D:\path\english.xml '
          name: English
          value: D:\path\english.xml
        - !!perl/hash:My::VariableOptionMap
          '': 'Spanish => D:\path\spanish.xml '
          name: Spanish
          value: D:\path\spanish.xml

---
'': 'C:\{@command}'
Path: !!perl/hash:My::Path
  '': 'C:\{@command}'
  drive: C
  elements:
    - !!perl/hash:My::Command
      '': '@command'
      name: command

---
'': 'c:\{@command :arg}'
Path: !!perl/hash:My::Path
  '': 'c:\{@command :arg}'
  drive: c
  elements:
    - !!perl/hash:My::Command
      '': '@command :arg'
      arg:
        - arg
      name: command

---
'': 'c:\{ @command : arg1 | arg2 }'
Path: !!perl/hash:My::Path
  '': 'c:\{ @command : arg1 | arg2 }'
  drive: c
  elements:
    - !!perl/hash:My::Command
      '': '@command : arg1 | arg2 '
      arg:
        - arg1
        - arg2
      name: command

Sample program:

my %ARGS = qw'
  LanguageId  English
  PersonId    someone
';

while( my $line = <> ){
  chomp $line;
  local %/;

  if( $line =~ $grammar ){
    say $/{Path}->fill( %ARGS );
  }else{
    say 'Error: ', $line;
  }
}

{
  package My::Path;

  sub fill{
    my($self,%args) = @_;

    my $out = $self->{drive}.':';

    for my $element ( @{ $self->{elements} } ){
      if( ref $element ){
        $out .= '\\' . $element->fill(%args);
      }else{
        $out .= "\\$element";
      }
    }

    return $out;
  }
}
{
  package My::Variable;

  sub fill{
    my($self,%args) = @_;

    my $name = $self->{name};

    if( exists $args{$name} ){
      $self->_fill( $args{$name} );
    }else{
      my $lc_name = lc $name;

      my @possible = grep {
        lc $_ eq $lc_name
      } keys %args;

      die qq'Cannot find argument for variable "$name"\n' unless @possible;
      if( @possible > 1 ){
        my $die = qq'Cannot determine which argument matches "$name" closer:\n';
        for my $possible( @possible ){
          $die .= qq'  "$possible"\n';
        }
        die $die;
      }

      $self->_fill( $args{$possible[1]} );
    }
  }
  sub _fill{
    my($self,$opt) = @_;

    # This is just an example.
    unless( exists $self->{options} ){
      return $opt;
    }

    for my $element ( @{$self->{options}} ){
      if( ref $element ){
        return '['.$element->value.']' if lc $element->name eq lc $opt;
      }elsif( lc $element eq lc $opt ){
        return $opt;
      }
    }

    my $name = $self->{name};
    my $die = qq'Invalid argument "$opt" for "$name" :\n';
    for my $valid ( @{$self->{options}} ){
      $die .= qq'  "$valid"\n';
    }
    die $die;
  }
}
{
  package My::VariableOptionMap;

  sub name{
    my($self) = @_;

    return $self->{name};
  }
}
{
  package My::Command;

  sub fill{
    my($self,%args) = @_;

    return '['.$self->{''}.']';
  }
}
{
  package My::VariableOptionMap;

  sub name{
    my($self) = @_;
    return $self->{name};
  }

  sub value{
    my($self) = @_;
    return $self->{value};
  }
}

Output using the example data:

D:\Path\English\someone
E:\English
F:\Some Thing\[D:\path\english.xml]
C:\[@command]
c:\[@command :arg]
c:\[@command : arg1 | arg2 ]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文