正则表达式负向前瞻

发布于 2024-10-16 08:34:21 字数 815 浏览 1 评论 0原文

我正在做一些正则表达体操。我给自己设定了一个任务,尝试搜索 C# 代码,其中使用了 as 运算符,但在合理的空间内没有进行空检查。现在我不想解析 C# 代码。例如,我想捕获代码片段,例如

    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1.a == y1.a)

但是,不捕获

    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1 == null)

也不就此而言

    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(somethingunrelated == null) {...}
    if(x1.a == y1.a)

因此任何随机空检查将被视为“良好检查”,因此不会被发现。

问题是:如何匹配某些内容,同时确保在其周围找不到其他内容。

我尝试过简单的方法,寻找“as”,然后在 150 个字符内进行否定前瞻。

\bas\b.{1,150}(?!\b==\s*null\b)

不幸的是,上面的正则表达式与上面的所有示例相匹配。我的直觉告诉我,问题在于先行查找然后进行负向先行会发现许多先行查找找不到 '== null' 的情况。

如果我尝试否定整个表达式,那么这也没有帮助,因为这会匹配周围的大多数 C# 代码。

I'm doing some regular expression gymnastics. I set myself the task of trying to search for C# code where there is a usage of the as-operator not followed by a null-check within a reasonable amount of space. Now I don't want to parse the C# code. E.g. I want to capture code snippets such as

    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1.a == y1.a)

however, not capture

    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1 == null)

nor for that matter

    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(somethingunrelated == null) {...}
    if(x1.a == y1.a)

Thus any random null-check will count as a "good check" and hence not found.

The question is: How do I match something while ensuring something else is not found in its sourroundings.

I've tried the naive approach, looking for 'as' then doing a negative lookahead within a 150 characters.

\bas\b.{1,150}(?!\b==\s*null\b)

The above regular expression matches all of the above examples infortunately. My gut tells me, the problem is that the looking ahead and then doing negative lookahead can find many situations where the lookahead does not find the '== null'.

If I try negating the whole expression, then that doesn't help either, at that would match most C# code around.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

月下凄凉 2024-10-23 08:34:21

喜欢正则表达式体操!这是一个带有注释的 PHP 正则表达式:

$re = '/# Find all AS, (but not preceding a XX == null).
    \bas\b               # Match "as"
    (?=                  # But only if...
      (?:                # there exist from 1-150
        [\S\s]           # chars, each of which
        (?!==\s*null)    # are NOT preceding "=NULL"
      ){1,150}?          # (and do this lazily)
      (?:                # We are done when either
        (?=              # we have reached
          ==\s*(?!null)  # a non NULL conditional
        )                #
      | $                # or the end of string.
      )
    )/ix'

这是 Javascript 风格的:

re = /\bas\b(?=(?:[\S\s](?!==\s*null)){1,150}?(?:(?===\s*(?!null))|$))/ig;

这确实让我有点头疼...

这是我正在使用的测试数据:

text = r"""    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1.a == y1.a)

however, not capture
    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1 == null)

nor for that matter
    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(somethingunrelated == null) {...}
    if(x1.a == y1.a)"""

I love regex gymnastics! Here is a commented PHP regex:

$re = '/# Find all AS, (but not preceding a XX == null).
    \bas\b               # Match "as"
    (?=                  # But only if...
      (?:                # there exist from 1-150
        [\S\s]           # chars, each of which
        (?!==\s*null)    # are NOT preceding "=NULL"
      ){1,150}?          # (and do this lazily)
      (?:                # We are done when either
        (?=              # we have reached
          ==\s*(?!null)  # a non NULL conditional
        )                #
      | $                # or the end of string.
      )
    )/ix'

And here it is in Javascript style:

re = /\bas\b(?=(?:[\S\s](?!==\s*null)){1,150}?(?:(?===\s*(?!null))|$))/ig;

This one did make my head hurt a little...

Here is the test data I am using:

text = r"""    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1.a == y1.a)

however, not capture
    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1 == null)

nor for that matter
    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(somethingunrelated == null) {...}
    if(x1.a == y1.a)"""
自控 2024-10-23 08:34:21

.{1,150} 放入前瞻中,并将 . 替换为 \s\S (一般来说,. > 不匹配换行符)。此外,== 附近的 \b 可能会产生误导。

\bas\b(?![\s\S]{1,150}==\s*null\b)

Put the .{1,150} inside the lookahead, and replace . with \s\S (in general, . doesn't match newlines). Also, the \b might be misleading near the ==.

\bas\b(?![\s\S]{1,150}==\s*null\b)
蹲墙角沉默 2024-10-23 08:34:21

我认为将变量名放入 () 中会有所帮助,这样您就可以将其用作反向引用。像下面这样的东西,

\b(\w+)\b\W*=\W*\w*\W*\bas\b[\s\S]{1,150}(?!\b\1\b\W*==\W*\bnull\b)

I think it would help to put the variable name into () so you can use it as a back reference. Something like the following,

\b(\w+)\b\W*=\W*\w*\W*\bas\b[\s\S]{1,150}(?!\b\1\b\W*==\W*\bnull\b)
-黛色若梦 2024-10-23 08:34:21

问题不清楚。你到底想要什么?我很遗憾,但在多次阅读问题和评论后我仍然不明白。

代码必须是 C# 吗?在Python中?其他 ?没有任何迹象表明这一点

您是否希望仅在 if(... == ...) 行跟在 var ... = ... 行块之后进行匹配?

或者,块和 if(... == ...) 行之间可能有异质行而不停止匹配?

我的代码将第二个选项视为 true。

if(... == ...) 行之后的 if(... == null) 行是否会停止匹配?

无法理解是还是否,我定义了两个正则表达式来捕获这两个选项。

我希望我的代码足够清晰并能满足您的关注。

它在Python

import re

ch1 ='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
1618987987849891
'''

ch2 ='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
3213546878'''

ch3='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
165478964654456454'''

ch4='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
hgyrtdduihudgug
if(x1 == null)
165489746+54646544'''

ch5='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null ) {...}
if(x1.a == y1.a)
1354687897'''

ch6='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
ifughobviudyhogiuvyhoiuhoiv
if(somethingunrelated == null ) {...}
if(x1.a == y1.a)
2468748874897498749874897'''

ch7 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
iufxresguygo
liygcygfuihoiuguyg
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''

ch8 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
iufxresguygo
liygcygfuihoiuguyg
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''

ch9 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''

pat1 = re.compile(('('
                   '(^var +\S+ *= *\S+ +as .+[\r\n]+)+?'
                   '([\s\S](?!==\s*null\\b))*?'
                   '^if *\( *[^\s=]+ *==(?!\s*null).+

结果中

>>> 
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)

var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)

var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
-----------------------------------------
None

None
-----------------------------------------
None

None
-----------------------------------------
None

None
-----------------------------------------
None

None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)

None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)

None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)

None
-----------------------------------------
>>> 
')' ), re.MULTILINE) pat2 = re.compile(('(' '(^var +\S+ *= *\S+ +as .+[\r\n]+)+?' '([\s\S](?!==\s*null\\b))*?' '^if *\( *[^\s=]+ *==(?!\s*null).+

结果中



                   ')'
                   '(?![\s\S]{0,150}==)'
                   ),
                  re.MULTILINE)


for ch in (ch1,ch2,ch3,ch4,ch5,ch6,ch7,ch8,ch9):
    print pat1.search(ch).group() if pat1.search(ch) else pat1.search(ch)
    print
    print pat2.search(ch).group() if pat2.search(ch) else pat2.search(ch)
    print '-----------------------------------------'

结果中

The question isn't clear. What do you want EXACTLY ? I regret, but I still don't understand, after having read the question and comments numerous times.

.

Must the code be in C# ? In Python ? Other ? There is no indication concerning this point

.

Do you want a matching only if a if(... == ...) line follows a block of var ... = ... lines ?

Or may an heterogenous line be BETWEEN the block and the if(... == ...) line without stopping the matching ?

My code takes the second option as true.

.

Does a if(... == null) line AFTER a if(... == ...) line stop the matchin or not ?

Unable to understand if it is yes or no, I defined the two regexes to catch these two options.

.

I hope my code will be clear enough and answering to your preoccupation.

It is in Python

import re

ch1 ='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
1618987987849891
'''

ch2 ='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
3213546878'''

ch3='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
165478964654456454'''

ch4='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
hgyrtdduihudgug
if(x1 == null)
165489746+54646544'''

ch5='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null ) {...}
if(x1.a == y1.a)
1354687897'''

ch6='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
ifughobviudyhogiuvyhoiuhoiv
if(somethingunrelated == null ) {...}
if(x1.a == y1.a)
2468748874897498749874897'''

ch7 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
iufxresguygo
liygcygfuihoiuguyg
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''

ch8 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
iufxresguygo
liygcygfuihoiuguyg
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''

ch9 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''

pat1 = re.compile(('('
                   '(^var +\S+ *= *\S+ +as .+[\r\n]+)+?'
                   '([\s\S](?!==\s*null\\b))*?'
                   '^if *\( *[^\s=]+ *==(?!\s*null).+

Result

>>> 
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)

var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)

var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
-----------------------------------------
None

None
-----------------------------------------
None

None
-----------------------------------------
None

None
-----------------------------------------
None

None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)

None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)

None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)

None
-----------------------------------------
>>> 
')' ), re.MULTILINE) pat2 = re.compile(('(' '(^var +\S+ *= *\S+ +as .+[\r\n]+)+?' '([\s\S](?!==\s*null\\b))*?' '^if *\( *[^\s=]+ *==(?!\s*null).+

Result



                   ')'
                   '(?![\s\S]{0,150}==)'
                   ),
                  re.MULTILINE)


for ch in (ch1,ch2,ch3,ch4,ch5,ch6,ch7,ch8,ch9):
    print pat1.search(ch).group() if pat1.search(ch) else pat1.search(ch)
    print
    print pat2.search(ch).group() if pat2.search(ch) else pat2.search(ch)
    print '-----------------------------------------'

Result

甩你一脸翔 2024-10-23 08:34:21

让我尝试重新定义您的问题:

  1. 查找“as”分配 - 您可能需要一个更好的正则表达式来查找实际分配,并且可能想要存储分配的表达式,但现在让我们使用“\bas\b”
  2. 如果您在 150 个字符内看到 if (... == null),则不匹配
  3. 如果在 150 个字符内没有看到 if (... == null) 150 个字符,匹配

由于负前瞻,您的表达式 \bas\b.{1,150}(?!\b==\s*null\b) 将不起作用。正则表达式总是可以向前或向后跳过一个字母,以避免这种消极的前瞻,即使存在 if (... == null) ,您最终也会匹配。

正则表达式确实不擅长匹配某些内容。在这种情况下,您最好尝试将“as”赋值与 150 个字符内的“if == null”检查相匹配:

\bas\b.{1,150}\b==\s*null\b

然后否定该检查: if (!regex.match(text)) 。 ..

Let me try to redefine your problem:

  1. Look for an "as" assignment -- you probably needs a better regex to look for actual assignments and may want to store the expression assigned, but let's use "\bas\b" for now
  2. If you see an if (... == null) within 150 characters, don't match
  3. If you don't see an if (... == null) within 150 characters, match

Your expression \bas\b.{1,150}(?!\b==\s*null\b) won't work because of the negative look-ahead. The regex can always skip ahead or behind one letter in order to avoid this negative look-ahead and you end up matching even when there is an if (... == null) there.

Regex's are really not good at not matching something. In this case, you're better of trying to match an "as" assignment with an "if == null" check within 150 characters:

\bas\b.{1,150}\b==\s*null\b

and then negating the check: if (!regex.match(text)) ...

德意的啸 2024-10-23 08:34:21
(?s:\s+as\s+(?!.{0,150}==\s*null\b))

我正在使用 ?s: 激活 SingleLine 选项。如果需要,您可以将其放入正则表达式的选项中。我要补充一点,我将 \s 放在 as 周围,因为我认为只有空格在 as 周围才是“合法”的。您可以将 \b 放置为“

(?s:\b+as\b(?!.{0,150}==\s*null\b))

请注意,\s”可能会捕获不是“有效空格”的空格。它定义为 [\f\n\r\t\v\x85\p{Z}],其中 \p{Z}“分隔符、空格”类别中的 Unicode 字符 加上 “分隔符、行”类别中的 Unicode 字符 加上 “分隔符、段落”类别中的 Unicode 字符

(?s:\s+as\s+(?!.{0,150}==\s*null\b))

I'm activating the SingleLine option with ?s:. You can put it in the options of your Regex if you want. I'll add that I'm putting \s around as because I think that only spaces are "legal" around the as. You can probably put the \b like

(?s:\b+as\b(?!.{0,150}==\s*null\b))

Be aware that \s will probably catch spaces that aren't "valid spaces". It's defined as [\f\n\r\t\v\x85\p{Z}] where \p{Z} is Unicode Characters in the 'Separator, Space' Category plus Unicode Characters in the 'Separator, Line' Category plus Unicode Characters in the 'Separator, Paragraph' Category.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文