检查字符串是否是 Javascript RegExp 的前缀

发布于 2024-07-11 11:36:09 字数 1009 浏览 8 评论 0原文

在 Javascript 中，我定义了一个正则表达式，现在用户正在输入一个字符串。我想告诉他，如果他继续输入或者他已经走错了路，他的字符串是否仍然可以匹配正则表达式。例如：

var re = /a*b/;

"a".isPrefixOf( re ); // true
"x".isPrefixOf( re ); // false

isPrefixOf 的实现是什么样的？

更新：感谢您的回答，按照布拉德的建议，使正则表达式前缀证明似乎是一个很好的解决方法。但我仍在尝试寻找通用的解决方案。

也许这样：我们创建一个新的正则表达式，其中用户输入后跟 .*。该正则表达式描述了用户仍然可以输入的所有单词。如果这个创建的正则表达式和原始正则表达式的交集为空，那么用户已经走错路了。如果不是的话，他就很好。例如：

var re = /a*b/;
var sInput = "a";
var reInput = new RegExp( sInput + ".*" );

reIntersection = re.intersect( reInput );
reIntersection.isEmpty(); // false

intersect() 返回一个新的正则表达式，该正则表达式仅接受 re 和 reInput 都接受的单词。该函数尚不存在，但我们可以使用前瞻来实现它：

RegExp.prototype.intersect = function( pattern2 ) { 
    return new RegExp( '(?=' + this.source  + ')' + pattern2.source );
}

仍然开放的是 isEmpty() 函数。我们如何检查 Javascript 正则表达式是否匹配任何单词或者它是否为空？

原文

In Javascript I have defined a regular expression and now a user is typing in a string. I want to tell him if his string still could match the RegExp if he continues typing or if he's already on the wrong way. For instance:

var re = /a*b/;

"a".isPrefixOf( re ); // true
"x".isPrefixOf( re ); // false

How could an implementation of isPrefixOf look like?

Update: Thanks for your answers, making the regex prefix-proof, as suggested by brad, seems to be a good workaround. But I'm still trying to find a general solution.

Maybe this way: We create a new regex with the user input followed by .*. This regex describes all words that the user still may enter. If the intersection of this created regex and the original regex is empty then the user is already on the wrong way. If it's not, he's doing fine. For instance:

var re = /a*b/;
var sInput = "a";
var reInput = new RegExp( sInput + ".*" );

reIntersection = re.intersect( reInput );
reIntersection.isEmpty(); // false

intersect() returns a new regex that accepts only word which both re and reInput would accept. The function doesn't exist yet but we can implement it using look-ahead:

RegExp.prototype.intersect = function( pattern2 ) { 
    return new RegExp( '(?=' + this.source  + ')' + pattern2.source );
}

What remains open is the isEmpty() function. How could we check, if a Javascript regex matches any word or if it's empty?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

逆光飞翔i 2024-07-18 11:36:09

人们对于如何解释这个问题似乎意见不一，所以我将用一个 Java 示例来演示这个概念。

import java.util.regex.*;

public class Test
{

  public static void main(String[] args) throws Exception
  {
    tryMatch("^a*b+$", "a", "ab", "abc");
  }

  public static void tryMatch(String regex, String... targets)
  {
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher("");
    System.out.printf("%nregex: %s%n", regex);
    System.out.printf("target | matches() | hitEnd()%n");
    for (String str : targets)
    {
      m.reset(str);
      System.out.printf("%-6s | %-9B | %-9B%n",
          str, m.matches(), m.hitEnd());
    }
  }
}

输出：

regex: ^a*b+$
target | matches() | hitEnd()
a      | FALSE     | TRUE
ab     | TRUE      | TRUE
abc    | FALSE     | FALSE

目标字符串“a”不匹配，因为正则表达式至少需要一个 b，但它可能是成功匹配的前缀，因此 hitEnd() 返回真。字符串“ab”具有匹配所需的所有内容，但如果我们在末尾添加更多 b，它也会匹配，因此 hitEnd() 仍然返回 真实。对于“abc”，匹配尝试在到达目标字符串末尾之前失败，因此正则表达式无法匹配任何以“abc”开头的字符串。

据我所知，Javascript 没有类似 Java 的 hitEnd() 方法，但可能可以伪造它。如果有人知道怎么做的话，那就是那个明目张胆的坏蛋，Steven Levithan。

People seem to be splitting evenly on how they interpret this question, so I'll demonstrate the concept with a Java example.

import java.util.regex.*;

public class Test
{

  public static void main(String[] args) throws Exception
  {
    tryMatch("^a*b+$", "a", "ab", "abc");
  }

  public static void tryMatch(String regex, String... targets)
  {
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher("");
    System.out.printf("%nregex: %s%n", regex);
    System.out.printf("target | matches() | hitEnd()%n");
    for (String str : targets)
    {
      m.reset(str);
      System.out.printf("%-6s | %-9B | %-9B%n",
          str, m.matches(), m.hitEnd());
    }
  }
}

output:

regex: ^a*b+$
target | matches() | hitEnd()
a      | FALSE     | TRUE
ab     | TRUE      | TRUE
abc    | FALSE     | FALSE

Target string "a" doesn't match because the regex requires at least one b, but it could be the prefix of a successful match, so hitEnd() returns true. String "ab" has all that's required for a match, but it would also match if we added more b's to the end, so hitEnd() still returns true. With "abc" the match attempt fails before it reaches the end of the target string, so the regex couldn't match any string that starts with "abc".

As far as I know, Javascript doesn't have anything like Java's hitEnd() method, but it might be possible to fake it. If anyone knows how, it'll be that Flagrant Badass, Steven Levithan.

回复收藏 0 原文

长发绾君心 2024-07-18 11:36:09

我认为你最好的选择是让你的正则表达式前缀证明。对于您给出的示例，/a*b/，我认为您可能可以使用/a*b?/.test(userinput)。对于更复杂的模式，这可能会变得越来越困难，但我仍然认为可以通过将每个子表达式嵌套在一系列可选量词 (?) 中来完成。例如：

/a*bcd*e/

前缀正则表达式可以是：

/a*(b(c(d*e?)?)?)?/

它有点混乱，但我认为会很好地解决你的问题。

I think your best bet here is to make your Regex prefix-proof. For the example you gave, /a*b/, I think you could probably use /a*b?/.test(userinput). For more complex patterns this could get increasingly difficult, but I still think it can be done by nesting each subexpression in a series of optional quantifiers (?). For instance:

/a*bcd*e/

The prefix regex could be:

/a*(b(c(d*e?)?)?)?/

Its a little messy, but will solve your problem rather well I think.

回复收藏 0 原文

南薇 2024-07-18 11:36:09

非常有趣的问题。在我的快速搜索中，我没有找到任何预定义的东西（甚至在 Perl 中也没有）可以解决这个问题。

编辑：哎哟，Java 似乎有类似的东西，称为 hitEnd() - 请参阅 Alan M 的答案。 hitEnd() 的作用是说 match() 的结果（true 或 false）可能会被额外的输入修改。《掌握正则表达式》一书说它不是很可靠（不知道为什么，谷歌图书中没有第 392 页）。

根据您使用的正则表达式的功能，可以快速破解，例如编写正则表达式的某种前缀：

例如，对于 a+a*b+c，您的前缀将是：

a+
a+a*
a+a*b+
a+a*b+c

并尝试将它们中的任何一个与您的字符串匹配，如果您使用选择运算符，如果您使用范围运算符 {n,m，则这种快速破解会变得困难。通常采用

话虽如此，我认为最好的解决方案是稍微修改匹配算法，

的匹配算法是回溯算法（即使最坏情况的行为是指数性的，该算法在实践中也很有效）。成功终止（即使没有消耗整个字符串）。您需要做的是修改终止条件，以便它在消耗了所有输入时也成功终止。

每当它到达正则表达式的末尾时就会也就是说，您可能必须在 JavaScript 中实际实现该算法。希望这将成为 Jquery 等库的一部分。

有关该算法的更多参考和理论，请查看本文：

http://swtch.com /~rsc/regexp/regexp1.html

（即使它反对回溯算法并建议基于 FA 的算法（但 FA 无法处理反向引用））。

Very interesting question. In my quick search, I didn't find anything predefined (not even in Perl) that solves this problem.

EDIT: Ouch, it seems Java has something similar called hitEnd() -- see Alan M's answer. What hitEnd() does is say that the result of match() (either true or false) might be modified by additional input. The book 'Mastering Regular Expressions" says it's not very reliable though (not sure why, page 392 not available in google books).

Depending on what features of regular expressions you use, a quick hack like writing some sort of prefixes of your regexp:

e.g. for a+a*b+c your prefixes will be:

a+
a+a*
a+a*b+
a+a*b+c

and try to match any of them with your string might work. This quick hack is made difficult if you use the choice operator, if you use the range operator {n,m} or back-references.

That being said, I think the good solution is to slightly modify the matching algorithm.

The matching algorithm normally employed is a backtracking algorithm (which works well in practice, even if worst case behavior is exponential). This algorithm successfully terminates whenever it has reached the end of the regexp (even if not the entire string was consumed). What you need to do is to modify the termination condition such that it also terminates successfully when it has consumed all of the input.

That being said, you'd probably have to actually implement the algorithm in JavaScript. Hopefully this will become part of libraries such as Jquery.

For more references and theory on the algorithm, check this article out:

http://swtch.com/~rsc/regexp/regexp1.html

(even if it makes a case against the backtracking algorithm and suggests a FA based algorithm (but the FA cannot handle back-references)).

回复收藏 0 原文

但可醉心 2024-07-18 11:36:09

执行此操作的一种方法是挂钩文本框的 onKeyUp 事件，并根据正则表达式.test 文本。
我的假设当然是您想要进行正则表达式匹配。
我不确定这是否正是您所需要的，事实上您的代码：

"a".isPrefixOf( re ); // true

永远不会匹配，因为它还需要有一个后续的“b”字符（您可能想要修改正则表达式）。
例如，此代码将针对与此格式匹配的任何字符串进行测试：

a-n(n)-b

这是代码，将其另存为页面并将其加载到浏览器中：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="it">
<body>
    <input type="text" size="20" id="txtData" onkeyup="showResult()" />
    <div id="dvResult" />
</body>
</html>
<script type="text/javascript">
//<![CDATA[

    theRegExp = /^a\-\d{1,2}\-b$/;

    function isPrefixOf( aText, aRegExp )
    {
        return aRegExp.test( aText );
    }

    function showResult()
    {
        res = document.getElementById( "dvResult" );
        res.innerHTML = isPrefixOf( document.getElementById( "txtData" ).value, theRegExp ) ? "Correct" : "Bad input";
    }

//]]>
</script>

One way of doing this could be to hook to the onKeyUp event of a text box and .test the text against the regular expression.
My assumption is of course that you want to do a regular expression matching.
I'm not sure if this is exactly what you need, in fact your code:

"a".isPrefixOf( re ); // true

will never match since it's required to also have a subsequent "b" character (you may want to modify the regular expression).
For instance, this code will test against any string matching this format:

a-n(n)-b

Here is the code, save it as a page and load it in your browser:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="it">
<body>
    <input type="text" size="20" id="txtData" onkeyup="showResult()" />
    <div id="dvResult" />
</body>
</html>
<script type="text/javascript">
//<![CDATA[

    theRegExp = /^a\-\d{1,2}\-b$/;

    function isPrefixOf( aText, aRegExp )
    {
        return aRegExp.test( aText );
    }

    function showResult()
    {
        res = document.getElementById( "dvResult" );
        res.innerHTML = isPrefixOf( document.getElementById( "txtData" ).value, theRegExp ) ? "Correct" : "Bad input";
    }

//]]>
</script>

回复收藏 0 原文