正则表达式负向前瞻
我正在做一些正则表达体操。我给自己设定了一个任务,尝试搜索 C# 代码,其中使用了 as 运算符,但在合理的空间内没有进行空检查。现在我不想解析 C# 代码。例如,我想捕获代码片段,例如
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
但是,不捕获
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
也不就此而言
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null) {...}
if(x1.a == y1.a)
因此任何随机空检查将被视为“良好检查”,因此不会被发现。
问题是:如何匹配某些内容,同时确保在其周围找不到其他内容。
我尝试过简单的方法,寻找“as”,然后在 150 个字符内进行否定前瞻。
\bas\b.{1,150}(?!\b==\s*null\b)
不幸的是,上面的正则表达式与上面的所有示例相匹配。我的直觉告诉我,问题在于先行查找然后进行负向先行会发现许多先行查找找不到 '== null' 的情况。
如果我尝试否定整个表达式,那么这也没有帮助,因为这会匹配周围的大多数 C# 代码。
I'm doing some regular expression gymnastics. I set myself the task of trying to search for C# code where there is a usage of the as-operator not followed by a null-check within a reasonable amount of space. Now I don't want to parse the C# code. E.g. I want to capture code snippets such as
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
however, not capture
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
nor for that matter
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null) {...}
if(x1.a == y1.a)
Thus any random null-check will count as a "good check" and hence not found.
The question is: How do I match something while ensuring something else is not found in its sourroundings.
I've tried the naive approach, looking for 'as' then doing a negative lookahead within a 150 characters.
\bas\b.{1,150}(?!\b==\s*null\b)
The above regular expression matches all of the above examples infortunately. My gut tells me, the problem is that the looking ahead and then doing negative lookahead can find many situations where the lookahead does not find the '== null'.
If I try negating the whole expression, then that doesn't help either, at that would match most C# code around.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我喜欢正则表达式体操!这是一个带有注释的 PHP 正则表达式:
这是 Javascript 风格的:
这确实让我有点头疼...
这是我正在使用的测试数据:
I love regex gymnastics! Here is a commented PHP regex:
And here it is in Javascript style:
This one did make my head hurt a little...
Here is the test data I am using:
将
.{1,150}
放入前瞻中,并将.
替换为\s\S
(一般来说,.
> 不匹配换行符)。此外,==
附近的\b
可能会产生误导。Put the
.{1,150}
inside the lookahead, and replace.
with\s\S
(in general,.
doesn't match newlines). Also, the\b
might be misleading near the==
.我认为将变量名放入 () 中会有所帮助,这样您就可以将其用作反向引用。像下面这样的东西,
I think it would help to put the variable name into () so you can use it as a back reference. Something like the following,
问题不清楚。你到底想要什么?我很遗憾,但在多次阅读问题和评论后我仍然不明白。
。
代码必须是 C# 吗?在Python中?其他 ?没有任何迹象表明这一点
。
您是否希望仅在
if(... == ...)
行跟在var ... = ...
行块之后进行匹配?或者,块和 if(... == ...) 行之间可能有异质行而不停止匹配?
我的代码将第二个选项视为 true。
。
if(... == ...)
行之后的if(... == null)
行是否会停止匹配?无法理解是还是否,我定义了两个正则表达式来捕获这两个选项。
。
我希望我的代码足够清晰并能满足您的关注。
它在Python
结果中
The question isn't clear. What do you want EXACTLY ? I regret, but I still don't understand, after having read the question and comments numerous times.
.
Must the code be in C# ? In Python ? Other ? There is no indication concerning this point
.
Do you want a matching only if a
if(... == ...)
line follows a block ofvar ... = ...
lines ?Or may an heterogenous line be BETWEEN the block and the
if(... == ...)
line without stopping the matching ?My code takes the second option as true.
.
Does a
if(... == null)
line AFTER aif(... == ...)
line stop the matchin or not ?Unable to understand if it is yes or no, I defined the two regexes to catch these two options.
.
I hope my code will be clear enough and answering to your preoccupation.
It is in Python
Result
让我尝试重新定义您的问题:
if (... == null)
,则不匹配if (... == null)
150 个字符,匹配由于负前瞻,您的表达式
\bas\b.{1,150}(?!\b==\s*null\b)
将不起作用。正则表达式总是可以向前或向后跳过一个字母,以避免这种消极的前瞻,即使存在if (... == null)
,您最终也会匹配。正则表达式确实不擅长不匹配某些内容。在这种情况下,您最好尝试将“as”赋值与 150 个字符内的“if == null”检查相匹配:
然后否定该检查:
if (!regex.match(text)) 。 ..
Let me try to redefine your problem:
if (... == null)
within 150 characters, don't matchif (... == null)
within 150 characters, matchYour expression
\bas\b.{1,150}(?!\b==\s*null\b)
won't work because of the negative look-ahead. The regex can always skip ahead or behind one letter in order to avoid this negative look-ahead and you end up matching even when there is anif (... == null)
there.Regex's are really not good at not matching something. In this case, you're better of trying to match an "as" assignment with an "if == null" check within 150 characters:
and then negating the check:
if (!regex.match(text)) ...
我正在使用
?s:
激活 SingleLine 选项。如果需要,您可以将其放入正则表达式的选项中。我要补充一点,我将\s
放在as
周围,因为我认为只有空格在as
周围才是“合法”的。您可以将\b
放置为“请注意,
\s
”可能会捕获不是“有效空格”的空格。它定义为[\f\n\r\t\v\x85\p{Z}]
,其中\p{Z}
是 “分隔符、空格”类别中的 Unicode 字符 加上 “分隔符、行”类别中的 Unicode 字符 加上 “分隔符、段落”类别中的 Unicode 字符。I'm activating the SingleLine option with
?s:
. You can put it in the options of your Regex if you want. I'll add that I'm putting\s
aroundas
because I think that only spaces are "legal" around theas
. You can probably put the\b
likeBe aware that
\s
will probably catch spaces that aren't "valid spaces". It's defined as[\f\n\r\t\v\x85\p{Z}]
where\p{Z}
is Unicode Characters in the 'Separator, Space' Category plus Unicode Characters in the 'Separator, Line' Category plus Unicode Characters in the 'Separator, Paragraph' Category.