如何正确转义正则表达式中的字符

发布于 2024-11-01 11:02:10 字数 917 浏览 4 评论 0原文

我想在字符串内进行字符串搜索。简单地说MySTR.search(Needle)

当此 needle 字符串包含特殊的正则表达式字符(如 *、+ 等)时,就会出现问题。它失败并出现错误无效量词

我浏览了网页,发现可以使用 \Q some string \E 转义字符串。

然而,这并不总是产生期望的行为。例如:

var sNeedle = '*Stars!*';
var sMySTR = 'The contents of this string have no importance';
sMySTR.search('\Q' + sNeedle + '\E');

结果为-1。好的。

var sNeedle = '**Stars!**';
var sMySTR = 'The contents of this string have no importance';
sMySTR.search('\Q' + sNeedle + '\E');

结果是“量词无效”。发生这种情况是因为 2 个或更多特殊字符相互“接触”,因为:

var sNeedle = '*Dont touch me*Stars!*Dont touch me*';
var sMySTR = 'The contents of this string have no importance';
sMySTR.search('\Q' + sNeedle + '\E');

可以正常工作。

我知道我可以创建一个函数 escapeAllBadChars(sInStr) 并在每个可能的特殊正则表达式字符之前添加双斜杠,但我想知道是否有更简单的方法来做到这一点?

I want to do a string search inside a string. Simply saying MySTR.search(Needle).

The problem occurs when this needle string contains special regex characters like *,+ and so on. It fails with error invalid quantifier.

I have browsed the web and found out that string can be escaped with \Q some string \E.

However, this does not always produce the desired behavior. For example:

var sNeedle = '*Stars!*';
var sMySTR = 'The contents of this string have no importance';
sMySTR.search('\Q' + sNeedle + '\E');

Result is -1. OK.

var sNeedle = '**Stars!**';
var sMySTR = 'The contents of this string have no importance';
sMySTR.search('\Q' + sNeedle + '\E');

Result is "invalid quantifier". This happens because 2 or more special characters are 'touching' each other, because:

var sNeedle = '*Dont touch me*Stars!*Dont touch me*';
var sMySTR = 'The contents of this string have no importance';
sMySTR.search('\Q' + sNeedle + '\E');

Will work OK.

I know I could make a function escapeAllBadChars(sInStr) and just add double slashes before every possible special regex character, but I'm wondering if there is a simpler way to do it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

笑着哭最痛 2024-11-08 11:02:10

\Q...\E 在 JavaScript 中不起作用(至少,它们不会转义任何内容...),如您所见:

var s = "*";
print(s.search(/\Q*\E/));
print(s.search(/\*/));

生成:

-1
0

正如您在 Ideone

以下字符需要转义:

  • (
  • )
  • [
  • {
  • *
  • +
  • .
  • $
  • ^
  • \
  • |

所以,类似这样的事情就可以了:

function quote(regex) {
  return regex.replace(/([()[{*+.$^\\|?])/g, '\\$1');
}

不,] 并且} 不需要转义:它们没有特殊含义,只有它们的开头对应部分。

请注意,当使用文字正则表达式 /.../ 时,您还需要转义 / 字符。但是,/ 不是正则表达式元字符:在 RegExp 对象中使用它时,它不需要转义。

\Q...\E doesn't work in JavaScript (at least, they don't escape anything...) as you can see:

var s = "*";
print(s.search(/\Q*\E/));
print(s.search(/\*/));

produces:

-1
0

as you can see on Ideone.

The following chars need to be escaped:

  • (
  • )
  • [
  • {
  • *
  • +
  • .
  • $
  • ^
  • \
  • |
  • ?

So, something like this would do:

function quote(regex) {
  return regex.replace(/([()[{*+.$^\\|?])/g, '\\$1');
}

No, ] and } don't need to be escaped: they have no special meaning, only their opening counter parts.

Note that when using a literal regex, /.../, you also need to escape the / char. However, / is not a regex meta character: when using it in a RegExp object, it doesn't need an escape.

撞了怀 2024-11-08 11:02:10

我只是尝试一下 Javascript,但是您是否有理由需要使用正则表达式引擎呢?怎么样

var sNeedle = '*Stars!*';
var sMySTR = 'The contents of this string have no importance';
if ( sMySTR.indexOf(sNeedle) > -1 ) {
   //found it
}

I'm just dipping my feet in Javascript, but is there a reason you need to use the regex engine at all? How about

var sNeedle = '*Stars!*';
var sMySTR = 'The contents of this string have no importance';
if ( sMySTR.indexOf(sNeedle) > -1 ) {
   //found it
}
尤怨 2024-11-08 11:02:10

我进行了一次快速的 Google 搜索,看看那里有什么,看来你有一些转义正则表达式字符的选项。根据一页,可以定义&运行如下所示的函数来转义有问题的字符:

RegExp.escape = function(text) {
    return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\
amp;");
}

或者,您可以尝试使用单独的库,例如 XRegExp,它已经处理您试图重新解决的细微差别。

I performed a quick Google search to see what's out there and it appears that you've got a few options for escaping regular expression characters. According to one page, you can define & run a function like below to escape problematic characters:

RegExp.escape = function(text) {
    return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\
amp;");
}

Alternatively, you can try and use a separate library such as XRegExp, which already handles nuances you're trying to re-solve.

菊凝晚露 2024-11-08 11:02:10

https://stackoverflow.com/a/6969486/151312 的重复项

根据 MDN 是正确的(请参阅帖子中的解释)多于):

function escapeRegExp(str) {
  return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\
amp;");
}

Duplicate of https://stackoverflow.com/a/6969486/151312

This is proper as per MDN (see explanation in post above):

function escapeRegExp(str) {
  return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\
amp;");
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文