当在负向先行语法中找到模式时中止正则表达式执行

发布于 2025-01-13 08:45:57 字数 1814 浏览 3 评论 0 原文

在努力尝试使用正则表达式验证 SQL Server 的连接字符串模式时,我得到了以下结果:

^(?!.*?(?<=^|\;)[a-zA-Z]+( [a-zA-Z]+)*(\=[^\;]+?\=[^\;]*)?(\;|$))+([a-zA-Z]+( [a-zA-Z]+)*\=[^\;]+\;?)+$

使用的示例字符串是:

选项=值;缺失值;多重赋值=123=456

*(regex101中托管和测试

并且,正如预期的那样,字符串不匹配。问题是,我认为这可能不是标准的、推荐的也不是最佳的正则表达式实现——特别是在负向前瞻部分,考虑到即使在成功匹配之后它也只是遍历整个字符串。

我将尝试在下面分解它的工作原理:


负向预测

1。 ^(?!.*?(?<=^|;)

负向先行模式从字符串开头开始,或者在分号字符之后递归地贯穿

2. [a-zA- Z]+( [a-zA-Z]+)*(=[^;]+?=[^;]*)?(;|$))+

匹配简单或复合选项名称 - 即是,只是[a-zA-Z]+(强制)或另外 ( [a-zA-Z]+)* 任意次数;之后,当任何给定选项有多个连续值分配时,有一个可选组尝试匹配;最后它以 ;$ (字符串结尾)结尾 - 如果是第一个,则先行模式从头开始重新启动(递归)

常规模式匹配

([a-zA-Z]+( [a-zA-Z]+)*=[^;]+;?)+$

这里没什么新意可说的,除了这是一个模式实际上应该匹配初始否定之后的字符串前瞻彻底扫描/验证。


我不能否认它有点符合我的预期,但我无法克制自己误解了正则表达式的工作原理。

有没有一种更简单的方法可以做到这一点,同时避免多次使用上述模式递归地向前看?

编辑:根据要求,一些更接近现实生活的示例如下 - 对于有效和无效格式:

  • VALID
数据库=somedb;用户名=admin;密码=P@ssword!23;端口=1433
  • 无效
  1. 用户名密码选项之间缺少分隔符
数据库=somedb;用户名=admin密码=P@ssword!23;端口=1433
  • 端口选项缺少值
  • 数据库=somedb;端口;用户名=admin;密码=P@ssword!23
    

    While struggling trying to validate SQL Server's connection string pattern using regex I've achieved the following result:

    ^(?!.*?(?<=^|\;)[a-zA-Z]+( [a-zA-Z]+)*(\=[^\;]+?\=[^\;]*)?(\;|$))+([a-zA-Z]+( [a-zA-Z]+)*\=[^\;]+\;?)+$
    

    Sample string used was:

    option=value;missingvalue;multiple assignment=123=456
    

    * (hosted and tested in regex101)

    And, as expected, the string didn't match. The issue is that I think this may not be standard, recommended nor optimal regex implementation — especially at the negative lookahead part, considering it's just going through the whole string even after a successful match.

    I'll try to break down how it works below:


    Negative Lookahead

    1. ^(?!.*?(?<=^|;)

    Negative lookahead pattern starting either at the beginning of the string or recursively throughout just after the semi colon character

    2. [a-zA-Z]+( [a-zA-Z]+)*(=[^;]+?=[^;]*)?(;|$))+

    Matching the simple or composite option names — that is, just [a-zA-Z]+ (mandatory) or, additionally, ( [a-zA-Z]+)* any number of times; afterwards there's an optional group that tries to match when there's more than one consecutive value assignment for any given option; finally it ends with either ; or $ (end of string) — in case of the first one, the lookahead pattern restarts from the beginning (recursion)

    Regular Pattern Matching

    ([a-zA-Z]+( [a-zA-Z]+)*=[^;]+;?)+$

    Not much new to say here other than that this is the pattern which should actually match the string after the initial Negative Lookahead thorough scan/validation.


    I can't deny that it's kinda working for what I intended, but I can't hold back the feeling that I'm misunderstanding something about regex's workings.

    Is there an easier way to do this while avoiding having to recursively look ahead using the pattern described above multiple times?

    EDIT: As requested, some closer to real life examples would be the following — for both valid and invalid formatting:

    • VALID
    Database=somedb;Username=admin;Password=P@ssword!23;Port=1433
    
    • INVALID
    1. missing delimiter between Username and Password options
    Database=somedb;Username=adminPassword=P@ssword!23;Port=1433
    
    1. missing value for Port option
    Database=somedb;Port;Username=admin;Password=P@ssword!23
    

    如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

    扫码二维码加入Web技术交流群

    发布评论

    需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

    评论(1

    伴随着你 2025-01-20 08:45:57

    以下字符串仅接受名称字母。出于测试目的,它接受值中除等于和分号之外的任何字符。这需要定义为需要排除行尾和制表符等字符。
    我们有一个负向前瞻来禁止值中出现第二个等号,并且有一个负向回顾来禁止在结尾之前出现分号。请注意,您的“正确”示例被发现是错误的,因为末尾没有分号
    如果我们尝试阻止相反的情况,则无法匹配正则表达式。
    我在名称中添加了一个可选的单个空格以匹配“连接超时”,类似地

    /^(\s*[a-zA-Z]+ ?[a-zA-Z]+=[^=;]+;)+$/gm
    

    我还允许在名称前添加空格。
    我们的字符串是由
    ^行首
    ( 开始群组
    \s* 名称前可选空格
    [a-zA-Z]+ ?[a-zA-Z]+名称前后至少包含一个字母(可选空格)。这意味着至少两个字母
    =等号
    (启动内部组
    (?!\=) 等号否定前瞻
    [^=;] 除等号和分号之外的任何字符至少一次
    ; 一个文字分号。
    ){4,}关闭外层组并重复至少 4 次
    $ end of line

    感谢 Casimir 和 Hippolyte 的改进。我在问题之后使用了前瞻和回顾,但你的语法更清晰。

    The following string accepts only letters for the names. for the purposes of testing it accepts any character except equals and semi colon in the values. This would need to be defined as characters like line ending and tab would need to be excluded.
    We have a negative lookahead to forbid a second equals sign in the values and a negative lookback to forbid a semi-colon before the end. Please note that your "correct" example is found to be wrong because there is no semi-colon at the end
    If we try to block the otherway round it becomes impossible to match the regex.
    I've added an optional single space in the name to match "Connection Timeout" and similar

    /^(\s*[a-zA-Z]+ ?[a-zA-Z]+=[^=;]+;)+$/gm
    

    I have also allowed spaces before the name.
    Our string is made up of
    ^beginning of line
    ( start group
    \s* optional whitespace before name
    [a-zA-Z]+ ?[a-zA-Z]+name containing at least one letter before and after an optional space. This means at least two letters
    =an equals sign
    (start inner group
    (?!\=) negative look ahead for equals sign
    [^=;] any character except equals and semi-colon at least once
    ; a literal semi-colon.
    ){4,}close the outer group and repeat it at least 4 times
    $ end of line

    Thank you Casimir et Hippolyte for the improvement. I was using look-aheads and look-backs following the question but your syntax is much cleaner.

    ~没有更多了~
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文