在努力尝试使用正则表达式验证 SQL Server 的连接字符串模式时,我得到了以下结果:
^(?!.*?(?<=^|\;)[a-zA-Z]+( [a-zA-Z]+)*(\=[^\;]+?\=[^\;]*)?(\;|$))+([a-zA-Z]+( [a-zA-Z]+)*\=[^\;]+\;?)+$
使用的示例字符串是:
选项=值;缺失值;多重赋值=123=456
*(在regex101中托管和测试)
并且,正如预期的那样,字符串不匹配。问题是,我认为这可能不是标准的、推荐的也不是最佳的正则表达式实现——特别是在负向前瞻部分,考虑到即使在成功匹配之后它也只是遍历整个字符串。
我将尝试在下面分解它的工作原理:
负向预测
1。 ^(?!.*?(?<=^|;)
负向先行模式从字符串开头开始,或者在分号字符之后递归地贯穿
2. [a-zA- Z]+( [a-zA-Z]+)*(=[^;]+?=[^;]*)?(;|$))+
匹配简单或复合选项名称 - 即是,只是[a-zA-Z]+
(强制)或另外 ( [a-zA-Z]+)*
任意次数;之后,当任何给定选项有多个连续值分配时,有一个可选组尝试匹配;最后它以 ;
或 $
(字符串结尾)结尾 - 如果是第一个,则先行模式从头开始重新启动(递归)
常规模式匹配
([a-zA-Z]+( [a-zA-Z]+)*=[^;]+;?)+$
这里没什么新意可说的,除了这是一个模式实际上应该匹配初始否定之后的字符串前瞻彻底扫描/验证。
我不能否认它有点符合我的预期,但我无法克制自己误解了正则表达式的工作原理。
有没有一种更简单的方法可以做到这一点,同时避免多次使用上述模式递归地向前看?
编辑:根据要求,一些更接近现实生活的示例如下 - 对于有效和无效格式:
数据库=somedb;用户名=admin;密码=P@ssword!23;端口=1433
- 用户名和密码选项之间缺少分隔符
数据库=somedb;用户名=admin密码=P@ssword!23;端口=1433
- 端口选项缺少值
数据库=somedb;端口;用户名=admin;密码=P@ssword!23
While struggling trying to validate SQL Server's connection string pattern using regex I've achieved the following result:
^(?!.*?(?<=^|\;)[a-zA-Z]+( [a-zA-Z]+)*(\=[^\;]+?\=[^\;]*)?(\;|$))+([a-zA-Z]+( [a-zA-Z]+)*\=[^\;]+\;?)+$
Sample string used was:
option=value;missingvalue;multiple assignment=123=456
* (hosted and tested in regex101)
And, as expected, the string didn't match. The issue is that I think this may not be standard, recommended nor optimal regex implementation — especially at the negative lookahead part, considering it's just going through the whole string even after a successful match.
I'll try to break down how it works below:
Negative Lookahead
1. ^(?!.*?(?<=^|;)
Negative lookahead pattern starting either at the beginning of the string or recursively throughout just after the semi colon character
2. [a-zA-Z]+( [a-zA-Z]+)*(=[^;]+?=[^;]*)?(;|$))+
Matching the simple or composite option names — that is, just [a-zA-Z]+
(mandatory) or, additionally, ( [a-zA-Z]+)*
any number of times; afterwards there's an optional group that tries to match when there's more than one consecutive value assignment for any given option; finally it ends with either ;
or $
(end of string) — in case of the first one, the lookahead pattern restarts from the beginning (recursion)
Regular Pattern Matching
([a-zA-Z]+( [a-zA-Z]+)*=[^;]+;?)+$
Not much new to say here other than that this is the pattern which should actually match the string after the initial Negative Lookahead thorough scan/validation.
I can't deny that it's kinda working for what I intended, but I can't hold back the feeling that I'm misunderstanding something about regex's workings.
Is there an easier way to do this while avoiding having to recursively look ahead using the pattern described above multiple times?
EDIT: As requested, some closer to real life examples would be the following — for both valid and invalid formatting:
Database=somedb;Username=admin;Password=P@ssword!23;Port=1433
- missing delimiter between Username and Password options
Database=somedb;Username=adminPassword=P@ssword!23;Port=1433
- missing value for Port option
Database=somedb;Port;Username=admin;Password=P@ssword!23
发布评论
评论(1)
以下字符串仅接受名称字母。出于测试目的,它接受值中除等于和分号之外的任何字符。这需要定义为需要排除行尾和制表符等字符。
我们有一个负向前瞻来禁止值中出现第二个等号,并且有一个负向回顾来禁止在结尾之前出现分号。请注意,您的“正确”示例被发现是错误的,因为末尾没有分号
如果我们尝试阻止相反的情况,则无法匹配正则表达式。
我在名称中添加了一个可选的单个空格以匹配“连接超时”,类似地
我还允许在名称前添加空格。
我们的字符串是由
^
行首(
开始群组\s*
名称前可选空格[a-zA-Z]+ ?[a-zA-Z]+
名称前后至少包含一个字母(可选空格)。这意味着至少两个字母=
等号(
启动内部组(?!\=)
等号否定前瞻[^=;]
除等号和分号之外的任何字符至少一次;
一个文字分号。){4,}
关闭外层组并重复至少 4 次$
end of line感谢 Casimir 和 Hippolyte 的改进。我在问题之后使用了前瞻和回顾,但你的语法更清晰。
The following string accepts only letters for the names. for the purposes of testing it accepts any character except equals and semi colon in the values. This would need to be defined as characters like line ending and tab would need to be excluded.
We have a negative lookahead to forbid a second equals sign in the values and a negative lookback to forbid a semi-colon before the end. Please note that your "correct" example is found to be wrong because there is no semi-colon at the end
If we try to block the otherway round it becomes impossible to match the regex.
I've added an optional single space in the name to match "Connection Timeout" and similar
I have also allowed spaces before the name.
Our string is made up of
^
beginning of line(
start group\s*
optional whitespace before name[a-zA-Z]+ ?[a-zA-Z]+
name containing at least one letter before and after an optional space. This means at least two letters=
an equals sign(
start inner group(?!\=)
negative look ahead for equals sign[^=;]
any character except equals and semi-colon at least once;
a literal semi-colon.){4,}
close the outer group and repeat it at least 4 times$
end of lineThank you Casimir et Hippolyte for the improvement. I was using look-aheads and look-backs following the question but your syntax is much cleaner.