RegExp 接口不明确

发布于 2024-10-29 20:54:04 字数 1013 浏览 2 评论 0原文

非常奇怪的事情。

var body="Received: from  ([195.000.000.0])\r\nReceived: from  ([77.000.000.000]) by   (6.0.000.000)"
var lastMath="";
var subExp = "[\\[\\(](\\d+\\.\\d+\\.\\d+\\.\\d+)[\\]\\)]"
var re = new RegExp("Received\\: from.*?"+subExp +".*", "mg");
var re1 = new RegExp(subExp , "mg");
while(ares= re.exec(body))
{
        print(ares[0])
        while( ares1 = re1.exec(ares[0]))
        {
            if(!IsLocalIP(ares1[1]))
            {
                 print(ares1[1]) 
                 lastMath=ares1[1];
                 break ;
            }
        }

}
print(lastMath)

它输出:

Received: from ([195.000.000.0])
195.000.000.0
Received: from ([77.000.000.000]) by (6.0.000.000)
6.0.000.000
6.0.000.000

但我认为应该是:

Received: from ([195.000.000.0])
195.000.000.0
Received: from ([77.000.000.000]) by (6.0.000.000)
77.000.000.000
77.000.000.000

因为显然“77.000.000.000”排在第一位。如果我评论“break”,输出顺序是正确的。 我的代码有什么问题吗?

Something very strange.

var body="Received: from  ([195.000.000.0])\r\nReceived: from  ([77.000.000.000]) by   (6.0.000.000)"
var lastMath="";
var subExp = "[\\[\\(](\\d+\\.\\d+\\.\\d+\\.\\d+)[\\]\\)]"
var re = new RegExp("Received\\: from.*?"+subExp +".*", "mg");
var re1 = new RegExp(subExp , "mg");
while(ares= re.exec(body))
{
        print(ares[0])
        while( ares1 = re1.exec(ares[0]))
        {
            if(!IsLocalIP(ares1[1]))
            {
                 print(ares1[1]) 
                 lastMath=ares1[1];
                 break ;
            }
        }

}
print(lastMath)

It outputs:

Received: from ([195.000.000.0])
195.000.000.0
Received: from ([77.000.000.000]) by (6.0.000.000)
6.0.000.000
6.0.000.000

But I think it should be:

Received: from ([195.000.000.0])
195.000.000.0
Received: from ([77.000.000.000]) by (6.0.000.000)
77.000.000.000
77.000.000.000

Because obviously "77.000.000.000" goes first. If I comment "break", output order is correct.
What's wrong with my code?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

赏烟花じ飞满天 2024-11-05 20:54:04

请注意,Javascript(和大多数语言)中的正则表达式分组不适用于 *+ 运算符的非常明显的行为。例如:

js>r = /^(ab[0-9])+$/
/^(ab[0-9])+$/
js>"ab1ab2ab3ab4".match(r)
ab1ab2ab3ab4,ab4    

在这种情况下,您将获得最后一个匹配的组,仅此而已。我不确定此行为是在哪里指定的,但它可能因语言而异。

编辑IsLocalIP() 的作用是什么?


好吧,我认为问题与 exec 的有状态性有关(这可能就是我不使用它的原因;我使用 String.match())——如果你'要执行此操作,您需要手动将正则表达式的 lastindex 属性初始化为 0,否则您会得到以下行为:

function weird(dobreak)
{
  var s = "Received: from  ([77.000.000.000]) by   (6.0.000.000)"
  var re1 = /[\[\(](\d+\.\d+\.\d+\.\d+)[\]\)]/mg
  while (s2 = re1.exec(s))
  {
    writeln("s2="+s2);
    if (dobreak)
      break;
  }
}

产生此结果:

js>weird(true)
js>weird(true)
s2=[77.000.000.000],77.000.000.000
js>weird(true)
s2=(6.0.000.000),6.0.000.000
js>weird(true)
js>

您会注意到同一个函数获得三个不同的结果,这意味着状态性正在把某些事情搞砸奇怪的原因(Javascript以某种方式缓存/保留正则表达式?我正在使用JSDB,它使用Spidermonkey = Firefox的javascript引擎)。

因此,如果我将代码更改为以下内容:

function notweird(dobreak)
{
  var s = "Received: from  ([77.000.000.000]) by   (6.0.000.000)"
  var re1 = /[\[\(](\d+\.\d+\.\d+\.\d+)[\]\)]/mg
  re1.lastIndex = 0;
  while (s2 = re1.exec(s))
  {
    writeln("s2="+s2);
    if (dobreak)
      break;
  }
}

然后我会得到预期的行为:

js>notweird(true)
s2=[77.000.000.000],77.000.000.000
js>notweird(true)
s2=[77.000.000.000],77.000.000.000
js>notweird(true)
s2=[77.000.000.000],77.000.000.000

Note that regex grouping in Javascript (and most languages) does not work with a very obvious behavior with the * or + operators. For example:

js>r = /^(ab[0-9])+$/
/^(ab[0-9])+$/
js>"ab1ab2ab3ab4".match(r)
ab1ab2ab3ab4,ab4    

In this case, you get the last group that matches and that's it. I'm not sure where this behavior is specified, but it can vary from language to language.

edit: What does IsLocalIP() do?


OK, I think the problem has to do with exec's statefulness (which may be why I don't use it; I use String.match()) -- if you're going to do this, you need to manually initialize the regex's lastindex property to 0, otherwise you get this behavior:

function weird(dobreak)
{
  var s = "Received: from  ([77.000.000.000]) by   (6.0.000.000)"
  var re1 = /[\[\(](\d+\.\d+\.\d+\.\d+)[\]\)]/mg
  while (s2 = re1.exec(s))
  {
    writeln("s2="+s2);
    if (dobreak)
      break;
  }
}

produces this result:

js>weird(true)
js>weird(true)
s2=[77.000.000.000],77.000.000.000
js>weird(true)
s2=(6.0.000.000),6.0.000.000
js>weird(true)
js>

You'll note that the same function gets three different results, which implies statefulness is mucking things up for some bizarre reason (Javascript is caching/interning the regex somehow? I'm using JSDB which uses Spidermonkey = Firefox's javascript engine).

So if I change the code to the following:

function notweird(dobreak)
{
  var s = "Received: from  ([77.000.000.000]) by   (6.0.000.000)"
  var re1 = /[\[\(](\d+\.\d+\.\d+\.\d+)[\]\)]/mg
  re1.lastIndex = 0;
  while (s2 = re1.exec(s))
  {
    writeln("s2="+s2);
    if (dobreak)
      break;
  }
}

Then I get the expected behavior:

js>notweird(true)
s2=[77.000.000.000],77.000.000.000
js>notweird(true)
s2=[77.000.000.000],77.000.000.000
js>notweird(true)
s2=[77.000.000.000],77.000.000.000
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文