JavaScript 中的负向后查找等效项

发布于 2024-07-15 09:50:31 字数 366 浏览 8 评论 0 原文

有没有办法在 JavaScript 正则表达式中实现与 负向后查找 等效的功能? 我需要匹配一个不以特定字符集开头的字符串。

如果在字符串的开头找到匹配的部分,我似乎无法找到一个不会失败的正则表达式。 负向回顾似乎是唯一的答案,但 JavaScript 却没有。

这是我想要使用的正则表达式,但它没有:

(?

所以它会匹配 'jim' 或 ' 中的 'm' m',但不是“果酱”

Is there a way to achieve the equivalent of a negative lookbehind in JavaScript regular expressions? I need to match a string that does not start with a specific set of characters.

It seems I am unable to find a regex that does this without failing if the matched part is found at the beginning of the string. Negative lookbehinds seem to be the only answer, but JavaScript doesn't has one.

This is the regex that I would like to work, but it doesn't:

(?<!([abcdefg]))m

So it would match the 'm' in 'jim' or 'm', but not 'jam'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

甲如呢乙后呢 2024-07-22 09:50:31

自 2018 年以来,Lookbehind 断言ECMAScript 语言规范

// positive lookbehind
(?<=...)
// negative lookbehind
(?<!...)

2018 年之前的答案

由于 Javascript 支持 负向预测,一种方法是:

  1. 反转输入字符串

  2. 与反向正则表达式

  3. 反转并重新格式化匹配


const reverse = s => s.split('').reverse().join('');

const test = (stringToTests, reversedRegexp) => stringToTests
  .map(reverse)
  .forEach((s,i) => {
    const match = reversedRegexp.test(s);
    console.log(stringToTests[i], match, 'token:', match ? reverse(reversedRegexp.exec(s)[0]) : 'Ø');
  });

示例1:

遵循@andrew-ensley的问题:

test(['jim', 'm', 'jam'], /m(?!([abcdefg]))/)

输出:

jim true token: m
m true token: m
jam false token: Ø

示例2:

以下@neaumusic 注释(匹配 max-height 但不匹配 line-height,标记为 height):

test(['max-height', 'line-height'], /thgieh(?!(-enil))/)

输出:

max-height true token: height
line-height false token: Ø

Since 2018, Lookbehind Assertions are part of the ECMAScript language specification.

// positive lookbehind
(?<=...)
// negative lookbehind
(?<!...)

Answer pre-2018

As Javascript supports negative lookahead, one way to do it is:

  1. reverse the input string

  2. match with a reversed regex

  3. reverse and reformat the matches


const reverse = s => s.split('').reverse().join('');

const test = (stringToTests, reversedRegexp) => stringToTests
  .map(reverse)
  .forEach((s,i) => {
    const match = reversedRegexp.test(s);
    console.log(stringToTests[i], match, 'token:', match ? reverse(reversedRegexp.exec(s)[0]) : 'Ø');
  });

Example 1:

Following @andrew-ensley's question:

test(['jim', 'm', 'jam'], /m(?!([abcdefg]))/)

Outputs:

jim true token: m
m true token: m
jam false token: Ø

Example 2:

Following @neaumusic comment (match max-height but not line-height, the token being height):

test(['max-height', 'line-height'], /thgieh(?!(-enil))/)

Outputs:

max-height true token: height
line-height false token: Ø
季末如歌 2024-07-22 09:50:31

Lookbehind 断言 得到 已接受2018 年 ECMAScript 规范

正向回顾用法:

console.log(
  "$9.99  €8.47".match(/(?<=\$)\d+\.\d*/) // Matches "9.99"
);

负向回顾用法:

console.log(
  "$9.99  €8.47".match(/(?<!\$)\d+\.\d*/) // Matches "8.47"
);

平台支持

Lookbehind Assertions got accepted into the ECMAScript specification in 2018.

Positive lookbehind usage:

console.log(
  "$9.99  €8.47".match(/(?<=\$)\d+\.\d*/) // Matches "9.99"
);

Negative lookbehind usage:

console.log(
  "$9.99  €8.47".match(/(?<!\$)\d+\.\d*/) // Matches "8.47"
);

Platform support

丶情人眼里出诗心の 2024-07-22 09:50:31

假设您想要找到所有前面没有 unsigned int

支持负向后查找:

(?<!unsigned )int

不支持负向后查找:

((?!unsigned ).{9}|^.{0,8})int

基本上的想法是获取前面的 n字符并排除与负向前瞻匹配,但也匹配前面没有 n 个字符的情况。 (其中 n 是后视的长度)。

因此,有问题的正则表达式:

(?<!([abcdefg]))m

将转换为:

((?!([abcdefg])).|^)m

您可能需要使用捕获组来找到您感兴趣的字符串的确切位置,或者您想用其他内容替换特定部分。

Let's suppose you want to find all int not preceded by unsigned :

With support for negative look-behind:

(?<!unsigned )int

Without support for negative look-behind:

((?!unsigned ).{9}|^.{0,8})int

Basically idea is to grab n preceding characters and exclude match with negative look-ahead, but also match the cases where there's no preceeding n characters. (where n is length of look-behind).

So the regex in question:

(?<!([abcdefg]))m

would translate to:

((?!([abcdefg])).|^)m

You might need to play with capturing groups to find exact spot of the string that interests you or you want to replace specific part with something else.

不奢求什么 2024-07-22 09:50:31

Mijoja 的策略适用于您的特定情况,但不适用于一般情况:

js>newString = "Fall ball bill balll llama".replace(/(ba)?ll/g,
   function($0,$1){ return $1?$0:"[match]";});
Fa[match] ball bi[match] balll [match]ama

下面是一个示例,其中目标是匹配双 l,但如果它前面有“ba”,则不匹配。 注意“ball”这个词——真正的lookbehind应该抑制前2个l,但匹配第二对。 但是,通过匹配前 2 个 l,然后将该匹配作为误报忽略,正则表达式引擎将从该匹配的末尾继续进行,并忽略误报中的任何字符。

Mijoja's strategy works for your specific case but not in general:

js>newString = "Fall ball bill balll llama".replace(/(ba)?ll/g,
   function($0,$1){ return $1?$0:"[match]";});
Fa[match] ball bi[match] balll [match]ama

Here's an example where the goal is to match a double-l but not if it is preceded by "ba". Note the word "balll" -- true lookbehind should have suppressed the first 2 l's but matched the 2nd pair. But by matching the first 2 l's and then ignoring that match as a false positive, the regexp engine proceeds from the end of that match, and ignores any characters within the false positive.

心在旅行 2024-07-22 09:50:31

使用

newString = string.replace(/([abcdefg])?m/, function($0,$1){ return $1?$0:'m';});

Use

newString = string.replace(/([abcdefg])?m/, function($0,$1){ return $1?$0:'m';});
一个人练习一个人 2024-07-22 09:50:31

您可以通过否定字符集来定义非捕获组:

(?:[^a-g])m

...它将匹配前面带有任何这些字母的每个 m NOT

You could define a non-capturing group by negating your character set:

(?:[^a-g])m

...which would match every m NOT preceded by any of those letters.

无尽的现实 2024-07-22 09:50:31

这就是我为 Node.js 8(不支持后向查找)实现 str.split(/(? 的方法:

str.split('').reverse().join('').split(/@(?!$)/).map(s => s.split('').reverse().join('')).reverse()

有效吗? 是的(unicode 未经测试)。 不愉快? 是的。

This is how I achieved str.split(/(?<!^)@/) for Node.js 8 (which doesn't support lookbehind):

str.split('').reverse().join('').split(/@(?!$)/).map(s => s.split('').reverse().join('')).reverse()

Works? Yes (unicode untested). Unpleasant? Yes.

神回复 2024-07-22 09:50:31

遵循Mijoja的想法,并借鉴JasonS暴露的问题,我有了这个想法; 我检查了一下,但对自己不太确定,所以由比我在 js 正则表达式方面更专家的人进行验证会很棒:)

var re = /(?=(..|^.?)(ll))/g
         // matches empty string position
         // whenever this position is followed by
         // a string of length equal or inferior (in case of "^")
         // to "lookbehind" value
         // + actual value we would want to match

,   str = "Fall ball bill balll llama"

,   str_done = str
,   len_difference = 0
,   doer = function (where_in_str, to_replace)
    {
        str_done = str_done.slice(0, where_in_str + len_difference)
        +   "[match]"
        +   str_done.slice(where_in_str + len_difference + to_replace.length)

        len_difference = str_done.length - str.length
            /*  if str smaller:
                    len_difference will be positive
                else will be negative
            */

    }   /*  the actual function that would do whatever we want to do
            with the matches;
            this above is only an example from Jason's */



        /*  function input of .replace(),
            only there to test the value of $behind
            and if negative, call doer() with interesting parameters */
,   checker = function ($match, $behind, $after, $where, $str)
    {
        if ($behind !== "ba")
            doer
            (
                $where + $behind.length
            ,   $after
                /*  one will choose the interesting arguments
                    to give to the doer, it's only an example */
            )
        return $match // empty string anyhow, but well
    }
str.replace(re, checker)
console.log(str_done)

我的个人输出:

Fa[match] ball bi[match] bal[match] [match]ama

原则是在每个点调用 checker任何两个字符之间的字符串,只要该位置是以下位置的起点:

--- 任何不需要的大小的子字符串(此处 'ba',因此 ..) (如果该大小已知;否则可能会更难做到)

--- --- 或小于该值(如果它是字符串的开头): ^.?

并遵循此,

--- 实际要寻求的是什么(此处'll')。

每次调用 checker 时,都会进行一次测试来检查 ll 之前的值是否不是我们不想要的值 (!== 'ba' ); 如果是这种情况,我们调用另一个函数,并且必须是这个函数 (doer) 来对 str 进行更改,如果目的是这个函数,或者更一般地说,它将得到输入必要的数据来手动处理str的扫描结果。

这里我们更改了字符串,因此我们需要跟踪长度差异,以便偏移 replace 给出的位置,所有这些都在 str 上计算,它本身永远不会改变。

由于原始字符串是不可变的,我们可以使用变量 str 来存储整个操作的结果,但我认为这个示例已经因替换而变得复杂,使用另一个变量会更清晰(str_done)。

我想就性能而言,它一定是相当严酷的:所有那些毫无意义的将 '' 替换为 '',this str.length-1 次,加上这里由 doer 手动替换,这意味着很多的切片...
可能在上面这个特定的情况下,可以通过将字符串仅在我们想要插入 [match] 的地方切割成碎片并使用 .join() 进行分组[match] 本身。

另一件事是,我不知道它将如何处理更复杂的情况,即假后视的复杂值......长度可能是获取的最有问题的数据。

并且,在 checker 中,如果 $behind 存在多种不需要的值,我们必须使用另一个正则表达式对其进行测试(在 checker 外部缓存(创建)) 是最好的,避免在每次调用 checker 时创建相同的正则表达式对象,以了解它是否是我们想要避免的。

希望我已经说清楚了; 如果没有,请不要犹豫,我会努力做得更好。 :)

following the idea of Mijoja, and drawing from the problems exposed by JasonS, i had this idea; i checked a bit but am not sure of myself, so a verification by someone more expert than me in js regex would be great :)

var re = /(?=(..|^.?)(ll))/g
         // matches empty string position
         // whenever this position is followed by
         // a string of length equal or inferior (in case of "^")
         // to "lookbehind" value
         // + actual value we would want to match

,   str = "Fall ball bill balll llama"

,   str_done = str
,   len_difference = 0
,   doer = function (where_in_str, to_replace)
    {
        str_done = str_done.slice(0, where_in_str + len_difference)
        +   "[match]"
        +   str_done.slice(where_in_str + len_difference + to_replace.length)

        len_difference = str_done.length - str.length
            /*  if str smaller:
                    len_difference will be positive
                else will be negative
            */

    }   /*  the actual function that would do whatever we want to do
            with the matches;
            this above is only an example from Jason's */



        /*  function input of .replace(),
            only there to test the value of $behind
            and if negative, call doer() with interesting parameters */
,   checker = function ($match, $behind, $after, $where, $str)
    {
        if ($behind !== "ba")
            doer
            (
                $where + $behind.length
            ,   $after
                /*  one will choose the interesting arguments
                    to give to the doer, it's only an example */
            )
        return $match // empty string anyhow, but well
    }
str.replace(re, checker)
console.log(str_done)

my personal output:

Fa[match] ball bi[match] bal[match] [match]ama

the principle is to call checker at each point in the string between any two characters, whenever that position is the starting point of:

--- any substring of the size of what is not wanted (here 'ba', thus ..) (if that size is known; otherwise it must be harder to do perhaps)

--- --- or smaller than that if it's the beginning of the string: ^.?

and, following this,

--- what is to be actually sought (here 'll').

At each call of checker, there will be a test to check if the value before ll is not what we don't want (!== 'ba'); if that's the case, we call another function, and it will have to be this one (doer) that will make the changes on str, if the purpose is this one, or more generically, that will get in input the necessary data to manually process the results of the scanning of str.

here we change the string so we needed to keep a trace of the difference of length in order to offset the locations given by replace, all calculated on str, which itself never changes.

since primitive strings are immutable, we could have used the variable str to store the result of the whole operation, but i thought the example, already complicated by the replacings, would be clearer with another variable (str_done).

i guess that in terms of performances it must be pretty harsh: all those pointless replacements of '' into '', this str.length-1 times, plus here manual replacement by doer, which means a lot of slicing...
probably in this specific above case that could be grouped, by cutting the string only once into pieces around where we want to insert [match] and .join()ing it with [match] itself.

the other thing is that i don't know how it would handle more complex cases, that is, complex values for the fake lookbehind... the length being perhaps the most problematic data to get.

and, in checker, in case of multiple possibilities of nonwanted values for $behind, we'll have to make a test on it with yet another regex (to be cached (created) outside checker is best, to avoid the same regex object to be created at each call for checker) to know whether or not it is what we seek to avoid.

hope i've been clear; if not don't hesitate, i'll try better. :)

小帐篷 2024-07-22 09:50:31

使用您的情况,如果您想用某些内容替换 m,例如将其转换为大写M,您可以否定捕获组中的设置。

匹配 ([^ag])m,替换为 $1M

"jim jam".replace(/([^a-g])m/g, "$1M")
\\jiM jam

([^ag]) 将匹配任何字符 not(^ )在ag范围内,并将其存储在第一个捕获组中,以便您可以使用$1访问它。

因此,我们在 jim 中找到 im 并将其替换为 iM,结果是 jiM

Using your case, if you want to replace m with something, e.g. convert it to uppercase M, you can negate set in capturing group.

match ([^a-g])m, replace with $1M

"jim jam".replace(/([^a-g])m/g, "$1M")
\\jiM jam

([^a-g]) will match any char not(^) in a-g range, and store it in first capturing group, so you can access it with $1.

So we find im in jim and replace it with iM which results in jiM.

知足的幸福 2024-07-22 09:50:31

如前所述,JavaScript 现在允许回顾。 在较旧的浏览器中,您仍然需要解决方法。

我敢打赌,没有办法找到一个没有lookbehind的正则表达式来准确地提供结果。 你所能做的就是与团体合作。 假设您有一个正则表达式 (?,其中 Wanted 是您要匹配的正则表达式,Before 是要匹配的正则表达式计算出比赛之前不应该发生的事情。 您能做的最好的事情就是否定正则表达式 Before 并使用正则表达式 NotBefore(Wanted)。 期望的结果是第一组$1

在您的情况下,Before=[abcdefg]很容易否定NotBefore=[^abcdefg]。 所以正则表达式将是[^abcdefg](m)。 如果您需要Wanted的位置,则必须将NotBefore也分组,这样所需的结果就是第二组。

如果 Before 模式的匹配项具有固定长度 n,即模式不包含重复标记,则可以避免对 Before 取反模式并使用正则表达式 (?!Before).{n}(Wanted),但仍然必须使用第一组或使用正则表达式 (?!Before)(.{ n})(通缉) 并使用第二组。 在此示例中,模式 Before 实际上具有固定长度,即 1,因此请使用正则表达式 (?![abcdefg]).(m)( ?![abcdefg])(.)(m)。 如果您对所有匹配感兴趣,请添加 g 标志,请参阅我的代码片段:

function TestSORegEx() {
  var s = "Donald Trump doesn't like jam, but Homer Simpson does.";
  var reg = /(?![abcdefg])(.{1})(m)/gm;
  var out = "Matches and groups of the regex " + 
            "/(?![abcdefg])(.{1})(m)/gm in \ns = \"" + s + "\"";
  var match = reg.exec(s);
  while(match) {
    var start = match.index + match[1].length;
    out += "\nWhole match: " + match[0] + ", starts at: " + match.index
        +  ". Desired match: " + match[2] + ", starts at: " + start + ".";   
    match = reg.exec(s);
  }
  out += "\nResulting string after statement s.replace(reg, \"$1*$2*\")\n"
         + s.replace(reg, "$1*$2*");
  alert(out);
}

As mentioned before, JavaScript allows lookbehinds now. In older browsers you still need a workaround.

I bet my head there is no way to find a regex without lookbehind that delivers the result exactly. All you can do is working with groups. Suppose you have a regex (?<!Before)Wanted, where Wanted is the regex you want to match and Before is the regex that counts out what should not precede the match. The best you can do is negate the regex Before and use the regex NotBefore(Wanted). The desired result is the first group $1.

In your case Before=[abcdefg] which is easy to negate NotBefore=[^abcdefg]. So the regex would be [^abcdefg](m). If you need the position of Wanted, you must group NotBefore too, so that the desired result is the second group.

If matches of the Before pattern have a fixed length n, that is, if the pattern contains no repetitive tokens, you can avoid negating the Before pattern and use the regular expression (?!Before).{n}(Wanted), but still have to use the first group or use the regular expression (?!Before)(.{n})(Wanted) and use the second group. In this example, the pattern Before actually has a fixed length, namely 1, so use the regex (?![abcdefg]).(m) or (?![abcdefg])(.)(m). If you are interested in all matches, add the g flag, see my code snippet:

function TestSORegEx() {
  var s = "Donald Trump doesn't like jam, but Homer Simpson does.";
  var reg = /(?![abcdefg])(.{1})(m)/gm;
  var out = "Matches and groups of the regex " + 
            "/(?![abcdefg])(.{1})(m)/gm in \ns = \"" + s + "\"";
  var match = reg.exec(s);
  while(match) {
    var start = match.index + match[1].length;
    out += "\nWhole match: " + match[0] + ", starts at: " + match.index
        +  ". Desired match: " + match[2] + ", starts at: " + start + ".";   
    match = reg.exec(s);
  }
  out += "\nResulting string after statement s.replace(reg, \"$1*$2*\")\n"
         + s.replace(reg, "$1*$2*");
  alert(out);
}
思念绕指尖 2024-07-22 09:50:31

这可以有效地做到这一点

"jim".match(/[^a-g]m/)
> ["im"]
"jam".match(/[^a-g]m/)
> null

搜索和替换示例

"jim jam".replace(/([^a-g])m/g, "$1M")
> "jiM jam"

请注意,负向后查找字符串必须为 1 个字符长才能正常工作。

This effectively does it

"jim".match(/[^a-g]m/)
> ["im"]
"jam".match(/[^a-g]m/)
> null

Search and replace example

"jim jam".replace(/([^a-g])m/g, "$1M")
> "jiM jam"

Note that the negative look-behind string must be 1 character long for this to work.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文