在 Javascript 中返回正则表达式 match() 的位置?

发布于 2024-08-22 07:57:11 字数 57 浏览 9 评论 0原文

有没有办法检索 Javascript 中正则表达式 match() 结果字符串内的(起始)字符位置?

Is there a way to retrieve the (starting) character positions inside a string of the results of a regex match() in Javascript?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

养猫人 2024-08-29 07:57:12
var str = 'my string here';

var index = str.match(/hre/).index;

alert(index); // <- 10

var str = 'my string here';

var index = str.match(/hre/).index;

alert(index); // <- 10

明月松间行 2024-08-29 07:57:11

exec 返回一个带有 index 属性的对象:

var match = /bar/.exec("foobar");
if (match) {
    console.log("match found at " + match.index);
}

对于多个匹配:

var re = /bar/g,
    str = "foobarfoobar";
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
}

exec returns an object with a index property:

var match = /bar/.exec("foobar");
if (match) {
    console.log("match found at " + match.index);
}

And for multiple matches:

var re = /bar/g,
    str = "foobarfoobar";
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
}

深空失忆 2024-08-29 07:57:11

这是我想出的:

// Finds starting and ending positions of quoted text
// in double or single quotes with escape char support like \" \'
var str = "this is a \"quoted\" string as you can 'read'";

var patt = /'((?:\\.|[^'])*)'|"((?:\\.|[^"])*)"/igm;

while (match = patt.exec(str)) {
  console.log(match.index + ' ' + patt.lastIndex);
}

Here's what I came up with:

// Finds starting and ending positions of quoted text
// in double or single quotes with escape char support like \" \'
var str = "this is a \"quoted\" string as you can 'read'";

var patt = /'((?:\\.|[^'])*)'|"((?:\\.|[^"])*)"/igm;

while (match = patt.exec(str)) {
  console.log(match.index + ' ' + patt.lastIndex);
}

咋地 2024-08-29 07:57:11

在现代浏览器中,您可以使用 字符串来完成此操作。 matchAll()

RegExp.exec() 相比,这种方法的好处是它不依赖于有状态的正则表达式,如 @Gumbo 的回答

let regexp = /bar/g;
let str = 'foobarfoobar';

let matches = [...str.matchAll(regexp)];
matches.forEach((match) => {
    console.log("match found at " + match.index);
});

In modern browsers, you can accomplish this with string.matchAll().

The benefit to this approach vs RegExp.exec() is that it does not rely on the regex being stateful, as in @Gumbo's answer.

let regexp = /bar/g;
let str = 'foobarfoobar';

let matches = [...str.matchAll(regexp)];
matches.forEach((match) => {
    console.log("match found at " + match.index);
});

空城仅有旧梦在 2024-08-29 07:57:11

来自 developer.mozilla.org 文档在 String .match() 方法上:

返回的数组有一个额外的输入属性,其中包含
已解析的原始字符串。此外,它还有一个索引
属性,表示匹配项的从零开始的索引
字符串。

处理非全局正则表达式(即正则表达式上没有 g 标志)时,.match() 返回的值有一个 index 财产...您所要做的就是访问它。

var index = str.match(/regex/).index;

这是一个显示它也有效的示例:

var str = 'my string here';

var index = str.match(/here/).index;

console.log(index); // <- 10

我已经成功地测试了这一点,一直到 IE5。

From developer.mozilla.org docs on the String .match() method:

The returned Array has an extra input property, which contains the
original string that was parsed. In addition, it has an index
property, which represents the zero-based index of the match in the
string
.

When dealing with a non-global regex (i.e., no g flag on your regex), the value returned by .match() has an index property...all you have to do is access it.

var index = str.match(/regex/).index;

Here is an example showing it working as well:

var str = 'my string here';

var index = str.match(/here/).index;

console.log(index); // <- 10

I have successfully tested this all the way back to IE5.

她比我温柔 2024-08-29 07:57:11

您可以使用 String 对象的 search 方法。这仅适用于第一场比赛,但否则会执行您所描述的操作。例如:

"How are you?".search(/are/);
// 4

You can use the search method of the String object. This will only work for the first match, but will otherwise do what you describe. For example:

"How are you?".search(/are/);
// 4
↙温凉少女 2024-08-29 07:57:11

这是我最近发现的一个很酷的功能,我在控制台上尝试了这个,它似乎有效:

var text = "border-bottom-left-radius";

var newText = text.replace(/-/g,function(match, index){
    return " " + index + " ";
});

返回:“border 6 Bottom 13 left 18 radius”

所以这似乎就是您正在寻找的。

Here is a cool feature I discovered recently, I tried this on the console and it seems to work:

var text = "border-bottom-left-radius";

var newText = text.replace(/-/g,function(match, index){
    return " " + index + " ";
});

Which returned: "border 6 bottom 13 left 18 radius"

So this seems to be what you are looking for.

彼岸花似海 2024-08-29 07:57:11

如果您的正则表达式匹配宽度 0,恐怕之前的答案(基于 exec)似乎不起作用。例如(注意:/\b/g是应该找到所有单词边界的正则表达式):

var re = /\b/g,
    str = "hello world";
var guard = 10;
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
    if (guard-- < 0) {
      console.error("Infinite loop detected")
      break;
    }
}

人们可以尝试通过让正则表达式匹配至少 1 个字符来解决此问题,但这远非理想(并且意味着您必须在字符串末尾手动添加索引)

var re = /\b./g,
    str = "hello world";
var guard = 10;
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
    if (guard-- < 0) {
      console.error("Infinite loop detected")
      break;
    }
}

更好的解决方案(仅适用于较新的浏览器/需要旧/IE版本的polyfills)是使用 String.prototype.matchAll()

var re = /\b/g,
    str = "hello world";
console.log(Array.from(str.matchAll(re)).map(match => match.index))

说明:

String.prototype.matchAll() 需要一个全局正则表达式(设置了全局标志的 g )。然后它返回一个迭代器。为了循环和 map() 迭代器,必须将其转换为数组(这正是 Array.from() 所做的)。与 RegExp.prototype.exec() 的结果一样,根据规范,结果元素具有 .index 字段。

请参阅 String.prototype.matchAll()< /a> 和 Array.from() 浏览器支持和填充选项的 MDN 页面。


编辑:深入挖掘所有浏览器支持的解决方案

RegExp.prototype.exec() 的问题是它更新了 lastIndex 指针指向正则表达式,下次从之前找到的 lastIndex 开始搜索。

var re = /l/g,
str = "hello world";
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)

只要正则表达式匹配实际上有宽度,这种方法就很有效。如果使用 0 宽度的正则表达式,该指针不会增加,因此您会得到无限循环(注意:/(?=l)/g 是 l 的前瞻——它匹配 0- l 之前的宽度字符串,因此它在第一次调用 exec() 时正确地转到索引 2,然后停留在那里:

var re = /(?=l)/g,
str = "hello world";
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)

因此,解决方案(不如 matchAll() 好,但应该适用于所有浏览器)是如果匹配宽度为 0(可以通过不同方式检查),则手动增加 lastIndex

var re = /\b/g,
    str = "hello world";
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);

    // alternative: if (match.index == re.lastIndex) {
    if (match[0].length == 0) {
      // we need to increase lastIndex -- this location was already matched,
      // we don't want to match it again (and get into an infinite loop)
      re.lastIndex++
    }
}

I'm afraid the previous answers (based on exec) don't seem to work in case your regex matches width 0. For instance (Note: /\b/g is the regex that should find all word boundaries) :

var re = /\b/g,
    str = "hello world";
var guard = 10;
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
    if (guard-- < 0) {
      console.error("Infinite loop detected")
      break;
    }
}

One can try to fix this by having the regex match at least 1 character, but this is far from ideal (and means you have to manually add the index at the end of the string)

var re = /\b./g,
    str = "hello world";
var guard = 10;
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
    if (guard-- < 0) {
      console.error("Infinite loop detected")
      break;
    }
}

A better solution (which does only work on newer browsers / needs polyfills on older/IE versions) is to use String.prototype.matchAll()

var re = /\b/g,
    str = "hello world";
console.log(Array.from(str.matchAll(re)).map(match => match.index))

Explanation:

String.prototype.matchAll() expects a global regex (one with g of global flag set). It then returns an iterator. In order to loop over and map() the iterator, it has to be turned into an array (which is exactly what Array.from() does). Like the result of RegExp.prototype.exec(), the resulting elements have an .index field according to the specification.

See the String.prototype.matchAll() and the Array.from() MDN pages for browser support and polyfill options.


Edit: digging a little deeper in search for a solution supported on all browsers

The problem with RegExp.prototype.exec() is that it updates the lastIndex pointer on the regex, and next time starts searching from the previously found lastIndex.

var re = /l/g,
str = "hello world";
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)

This works great as long as the regex match actually has a width. If using a 0 width regex, this pointer does not increase, and so you get your infinite loop (note: /(?=l)/g is a lookahead for l -- it matches the 0-width string before an l. So it correctly goes to index 2 on the first call of exec(), and then stays there:

var re = /(?=l)/g,
str = "hello world";
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)

The solution (that is less nice than matchAll(), but should work on all browsers) therefore is to manually increase the lastIndex if the match width is 0 (which may be checked in different ways)

var re = /\b/g,
    str = "hello world";
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);

    // alternative: if (match.index == re.lastIndex) {
    if (match[0].length == 0) {
      // we need to increase lastIndex -- this location was already matched,
      // we don't want to match it again (and get into an infinite loop)
      re.lastIndex++
    }
}

情独悲 2024-08-29 07:57:11

我很幸运地使用了基于 matchAll 的单行解决方案(我的用例需要一个字符串位置数组)

let regexp = /bar/g;
let str = 'foobarfoobar';

let matchIndices = Array.from(str.matchAll(regexp)).map(x => x.index);

console.log(matchIndices)

输出:[3, 9]

I had luck using this single-line solution based on matchAll (my use case needs an array of string positions)

let regexp = /bar/g;
let str = 'foobarfoobar';

let matchIndices = Array.from(str.matchAll(regexp)).map(x => x.index);

console.log(matchIndices)

output: [3, 9]

送君千里 2024-08-29 07:57:11

该成员 fn 返回 String 对象内输入单词的从 0 开始的位置(如果有)的数组。

String.prototype.matching_positions = function( _word, _case_sensitive, _whole_words, _multiline )
{
   /*besides '_word' param, others are flags (0|1)*/
   var _match_pattern = "g"+(_case_sensitive?"i":"")+(_multiline?"m":"") ;
   var _bound = _whole_words ? "\\b" : "" ;
   var _re = new RegExp( _bound+_word+_bound, _match_pattern );
   var _pos = [], _chunk, _index = 0 ;

   while( true )
   {
      _chunk = _re.exec( this ) ;
      if ( _chunk == null ) break ;
      _pos.push( _chunk['index'] ) ;
      _re.lastIndex = _chunk['index']+1 ;
   }

   return _pos ;
}

现在尝试

var _sentence = "What do doers want ? What do doers need ?" ;
var _word = "do" ;
console.log( _sentence.matching_positions( _word, 1, 0, 0 ) );
console.log( _sentence.matching_positions( _word, 1, 1, 0 ) );

您还可以输入正则表达式:

var _second = "z^2+2z-1" ;
console.log( _second.matching_positions( "[0-9]\z+", 0, 0, 0 ) );

这里获取线性项的位置索引。

This member fn returns an array of 0-based positions, if any, of the input word inside the String object

String.prototype.matching_positions = function( _word, _case_sensitive, _whole_words, _multiline )
{
   /*besides '_word' param, others are flags (0|1)*/
   var _match_pattern = "g"+(_case_sensitive?"i":"")+(_multiline?"m":"") ;
   var _bound = _whole_words ? "\\b" : "" ;
   var _re = new RegExp( _bound+_word+_bound, _match_pattern );
   var _pos = [], _chunk, _index = 0 ;

   while( true )
   {
      _chunk = _re.exec( this ) ;
      if ( _chunk == null ) break ;
      _pos.push( _chunk['index'] ) ;
      _re.lastIndex = _chunk['index']+1 ;
   }

   return _pos ;
}

Now try

var _sentence = "What do doers want ? What do doers need ?" ;
var _word = "do" ;
console.log( _sentence.matching_positions( _word, 1, 0, 0 ) );
console.log( _sentence.matching_positions( _word, 1, 1, 0 ) );

You can also input regular expressions:

var _second = "z^2+2z-1" ;
console.log( _second.matching_positions( "[0-9]\z+", 0, 0, 0 ) );

Here one gets the position index of linear term.

风吹过旳痕迹 2024-08-29 07:57:11
var str = "The rain in SPAIN stays mainly in the plain";

function searchIndex(str, searchValue, isCaseSensitive) {
  var modifiers = isCaseSensitive ? 'gi' : 'g';
  var regExpValue = new RegExp(searchValue, modifiers);
  var matches = [];
  var startIndex = 0;
  var arr = str.match(regExpValue);

  [].forEach.call(arr, function(element) {
    startIndex = str.indexOf(element, startIndex);
    matches.push(startIndex++);
  });

  return matches;
}

console.log(searchIndex(str, 'ain', true));
var str = "The rain in SPAIN stays mainly in the plain";

function searchIndex(str, searchValue, isCaseSensitive) {
  var modifiers = isCaseSensitive ? 'gi' : 'g';
  var regExpValue = new RegExp(searchValue, modifiers);
  var matches = [];
  var startIndex = 0;
  var arr = str.match(regExpValue);

  [].forEach.call(arr, function(element) {
    startIndex = str.indexOf(element, startIndex);
    matches.push(startIndex++);
  });

  return matches;
}

console.log(searchIndex(str, 'ain', true));
尤怨 2024-08-29 07:57:11
function trimRegex(str, regex){
    return str.substr(str.match(regex).index).split('').reverse().join('').substr(str.match(regex).index).split('').reverse().join('');
}

let test = '||ab||cd||';
trimRegex(test, /[^|]/);
console.log(test); //output: ab||cd

或者

function trimChar(str, trim, req){
    let regex = new RegExp('[^'+trim+']');
    return str.substr(str.match(regex).index).split('').reverse().join('').substr(str.match(regex).index).split('').reverse().join('');
}

let test = '||ab||cd||';
trimChar(test, '|');
console.log(test); //output: ab||cd
function trimRegex(str, regex){
    return str.substr(str.match(regex).index).split('').reverse().join('').substr(str.match(regex).index).split('').reverse().join('');
}

let test = '||ab||cd||';
trimRegex(test, /[^|]/);
console.log(test); //output: ab||cd

or

function trimChar(str, trim, req){
    let regex = new RegExp('[^'+trim+']');
    return str.substr(str.match(regex).index).split('').reverse().join('').substr(str.match(regex).index).split('').reverse().join('');
}

let test = '||ab||cd||';
trimChar(test, '|');
console.log(test); //output: ab||cd
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文