创建与数字范围匹配的正则表达式的函数

发布于 2024-11-24 08:22:33 字数 464 浏览 1 评论 0 原文

我正在使用 Amazon Mechanical Turk API,它只允许我使用正则表达式来过滤数据字段。

我想向函数输入一个整数范围,例如 256-311 或 45-1233,并返回仅匹配该范围的正则表达式。

匹配 256-321 的正则表达式将是:

\b((25[6-9])|(2[6-9][0-9])|(3[0-1][0-9])|(32[0-1]))\b

这部分相当简单,但我在创建此正则表达式的循环时遇到问题。

我正在尝试构建一个如下定义的函数:

function getRangeRegex( int fromInt, int toInt)
{

      return regexString;
}

我浏览了整个网络,令我惊讶的是,过去似乎没有人解决过这个问题。这是一个难题...

感谢您的宝贵时间。

I am working with the Amazon Mechanical Turk API and it will only allow me to use regular expressions to filter a field of data.

I would like to input an integer range to a function, such as 256-311 or 45-1233, and return a regex that would match only that range.

A regex matching 256-321 would be:

\b((25[6-9])|(2[6-9][0-9])|(3[0-1][0-9])|(32[0-1]))\b

That part is fairly easy, but I am having trouble with the loop to create this regex.

I am trying to build a function defined like this:

function getRangeRegex( int fromInt, int toInt)
{

      return regexString;
}

I looked all over the web and I am surprised that it doesn't look like anyone has solved this in the past. It is a difficult problem...

Thanks for your time.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

心头的小情儿 2024-12-01 08:22:33

这是一个快速破解:

<?php

function regex_range($from, $to) {

  if($from < 0 || $to < 0) {
    throw new Exception("Negative values not supported"); 
  }

  if($from > $to) {
    throw new Exception("Invalid range $from..$to, from > to"); 
  }

  $ranges = array($from);
  $increment = 1;
  $next = $from;
  $higher = true;

  while(true) {

    $next += $increment;

    if($next + $increment > $to) {
      if($next <= $to) {
        $ranges[] = $next;
      }
      $increment /= 10;
      $higher = false;
    }
    else if($next % ($increment*10) === 0) {
      $ranges[] = $next;
      $increment = $higher ? $increment*10 : $increment/10;
    }

    if(!$higher && $increment < 10) {
      break;
    }
  }

  $ranges[] = $to + 1;

  $regex = '/^(?:';

  for($i = 0; $i < sizeof($ranges) - 1; $i++) {
    $str_from = (string)($ranges[$i]);
    $str_to = (string)($ranges[$i + 1] - 1);

    for($j = 0; $j < strlen($str_from); $j++) {
      if($str_from[$j] == $str_to[$j]) {
        $regex .= $str_from[$j];
      }
      else {
        $regex .= "[" . $str_from[$j] . "-" . $str_to[$j] . "]";
      }
    }
    $regex .= "|";
  }

  return substr($regex, 0, strlen($regex)-1) . ')$/';
}

function test($from, $to) {
  try {
    printf("%-10s %s\n", $from . '-' . $to, regex_range($from, $to));
  } catch (Exception $e) {
    echo $e->getMessage() . "\n";
  }
}

test(2, 8);
test(5, 35);
test(5, 100);
test(12, 1234);
test(123, 123);
test(256, 321);
test(256, 257);
test(180, 195);
test(2,1);
test(-2,4);

?>

它会产生:

2-8        /^(?:[2-7]|8)$/
5-35       /^(?:[5-9]|[1-2][0-9]|3[0-5])$/
5-100      /^(?:[5-9]|[1-9][0-9]|100)$/
12-1234    /^(?:1[2-9]|[2-9][0-9]|[1-9][0-9][0-9]|1[0-2][0-3][0-4])$/
123-123    /^(?:123)$/
256-321    /^(?:25[6-9]|2[6-9][0-9]|3[0-2][0-1])$/
256-257    /^(?:256|257)$/
180-195    /^(?:18[0-9]|19[0-5])$/
Invalid range 2..1, from > to
Negative values not supported

未经正确测试,使用风险自负!

是的,在许多情况下生成的正则表达式可以写得更紧凑,但我将其作为读者的练习:)

Here's a quick hack:

<?php

function regex_range($from, $to) {

  if($from < 0 || $to < 0) {
    throw new Exception("Negative values not supported"); 
  }

  if($from > $to) {
    throw new Exception("Invalid range $from..$to, from > to"); 
  }

  $ranges = array($from);
  $increment = 1;
  $next = $from;
  $higher = true;

  while(true) {

    $next += $increment;

    if($next + $increment > $to) {
      if($next <= $to) {
        $ranges[] = $next;
      }
      $increment /= 10;
      $higher = false;
    }
    else if($next % ($increment*10) === 0) {
      $ranges[] = $next;
      $increment = $higher ? $increment*10 : $increment/10;
    }

    if(!$higher && $increment < 10) {
      break;
    }
  }

  $ranges[] = $to + 1;

  $regex = '/^(?:';

  for($i = 0; $i < sizeof($ranges) - 1; $i++) {
    $str_from = (string)($ranges[$i]);
    $str_to = (string)($ranges[$i + 1] - 1);

    for($j = 0; $j < strlen($str_from); $j++) {
      if($str_from[$j] == $str_to[$j]) {
        $regex .= $str_from[$j];
      }
      else {
        $regex .= "[" . $str_from[$j] . "-" . $str_to[$j] . "]";
      }
    }
    $regex .= "|";
  }

  return substr($regex, 0, strlen($regex)-1) . ')$/';
}

function test($from, $to) {
  try {
    printf("%-10s %s\n", $from . '-' . $to, regex_range($from, $to));
  } catch (Exception $e) {
    echo $e->getMessage() . "\n";
  }
}

test(2, 8);
test(5, 35);
test(5, 100);
test(12, 1234);
test(123, 123);
test(256, 321);
test(256, 257);
test(180, 195);
test(2,1);
test(-2,4);

?>

which produces:

2-8        /^(?:[2-7]|8)$/
5-35       /^(?:[5-9]|[1-2][0-9]|3[0-5])$/
5-100      /^(?:[5-9]|[1-9][0-9]|100)$/
12-1234    /^(?:1[2-9]|[2-9][0-9]|[1-9][0-9][0-9]|1[0-2][0-3][0-4])$/
123-123    /^(?:123)$/
256-321    /^(?:25[6-9]|2[6-9][0-9]|3[0-2][0-1])$/
256-257    /^(?:256|257)$/
180-195    /^(?:18[0-9]|19[0-5])$/
Invalid range 2..1, from > to
Negative values not supported

Not properly tested, use at your own risk!

And yes, the generated regex could be written more compact in many cases, but I leave that as an exercise for the reader :)

偏爱自由 2024-12-01 08:22:33

对于像我一样正在寻找上面伟大的 @Bart Kiers 作品的 javascript 版本的人

//Credit: Bart Kiers 2011
function regex_range(from, to){
        if(from < 0 || to < 0) {
            //throw new Exception("Negative values not supported"); 
            return null;
        }
        if(from > to) {
            //throw new Exception("Invalid range from..to, from > to"); 
            return null;
        }

        var ranges = [];
        ranges.push(from);
        var increment = 1;
        var next = from;
        var higher = true;

        while(true){
            next += increment;
            if(next + increment > to) {
                if(next <= to) {
                    ranges.push(next);
                }
                increment /= 10;
                higher = false;
            }else{ 
                if(next % (increment*10) == 0) {
                    ranges.push(next);
                    increment = higher ? increment*10 : increment/10;
                }
            }

            if(!higher && increment < 10) {
                break;
            }
        }

        ranges.push(to + 1);
        var regex = '/^(?:';

        for(var i = 0; i < ranges.length - 1; i++) {
            var str_from = ranges[i];
            str_from = str_from.toString();
            var str_to = ranges[i + 1] - 1;
            str_to = str_to.toString();
            for(var j = 0; j < str_from.length; j++) {
                if(str_from[j] == str_to[j]) {
                    regex += str_from[j];
                }
                else {
                    regex += "[" + str_from[j] + "-" + str_to[j] + "]";
                }
            }
            regex += "|";
        }

        return regex.substr(0, regex.length - 1 ) + ')$/';
    }

For anyone else who, like me, was looking for the javascript version of the great @Bart Kiers's production above

//Credit: Bart Kiers 2011
function regex_range(from, to){
        if(from < 0 || to < 0) {
            //throw new Exception("Negative values not supported"); 
            return null;
        }
        if(from > to) {
            //throw new Exception("Invalid range from..to, from > to"); 
            return null;
        }

        var ranges = [];
        ranges.push(from);
        var increment = 1;
        var next = from;
        var higher = true;

        while(true){
            next += increment;
            if(next + increment > to) {
                if(next <= to) {
                    ranges.push(next);
                }
                increment /= 10;
                higher = false;
            }else{ 
                if(next % (increment*10) == 0) {
                    ranges.push(next);
                    increment = higher ? increment*10 : increment/10;
                }
            }

            if(!higher && increment < 10) {
                break;
            }
        }

        ranges.push(to + 1);
        var regex = '/^(?:';

        for(var i = 0; i < ranges.length - 1; i++) {
            var str_from = ranges[i];
            str_from = str_from.toString();
            var str_to = ranges[i + 1] - 1;
            str_to = str_to.toString();
            for(var j = 0; j < str_from.length; j++) {
                if(str_from[j] == str_to[j]) {
                    regex += str_from[j];
                }
                else {
                    regex += "[" + str_from[j] + "-" + str_to[j] + "]";
                }
            }
            regex += "|";
        }

        return regex.substr(0, regex.length - 1 ) + ')$/';
    }
鯉魚旗 2024-12-01 08:22:33

的 PHP 端口

class RegexRangeNumberGenerator {

    static function parse($min, $max, $MatchWholeWord = FALSE, $MatchWholeLine = FALSE, $MatchLeadingZero = FALSE) {
        if (!is_int($min) || !is_int($max) || $min > $max || $min < 0 || $max < 0) {
            return FALSE;
        }
        if ($min == $max) {
            return self::parseIntoPattern($min, $MatchWholeWord, $MatchWholeLine, $MatchLeadingZero);
        }
        $s = [];
        $x = self::parseStartRange($min, $max);
        foreach ($x as $o) {
            $s[] = self::parseEndRange($o[0], $o[1]);
        }
        $n = self::reformatArray($s);
        $h = self::parseIntoRegex($n);
        return self::parseIntoPattern($h, $MatchWholeWord, $MatchWholeLine, $MatchLeadingZero);
    }

    static private function parseIntoPattern($t, $MatchWholeWord = FALSE, $MatchWholeLine = FALSE, $MatchLeadingZero = FALSE) {
        $r = ((is_array($t)) ? implode("|", $t) : $t);
        return (($MatchWholeLine && $MatchLeadingZero) ? "^0*(" . $r . ")$" : (($MatchLeadingZero) ? "0*(" . $r . ")" : (($MatchWholeLine) ? "^(" . $r . ")$" : (($MatchWholeWord) ? "\\b(" . $r . ")\\b" : "(" . $r . ")"))));
    }

    static private function parseIntoRegex($t) {
        if (!is_array($t)) {
            throw new Exception("Argument needs to be an array!");
        }
        $r = [];
        for ($i = 0; $i < count($t); $i++) {
            $e = str_split($t[$i][0]);
            $n = str_split($t[$i][1]);
            $s = "";
            $o = 0;
            $h = "";
            for ($a = 0; $a < count($e); $a++) {
                if ($e[$a] === $n[$a]) {
                    $h .= $e[$a];
                } else {
                    if ((intval($e[$a]) + 1) === intval($n[$a])) {
                        $h .= "[" . $e[$a] . $n[$a] . "]";
                    } else {
                        if ($s === ($e[$a] . $n[$a])) {
                            $o++;
                        }
                        $s = $e[$a] . $n[$a];
                        if ($a == (count($e) - 1)) {
                            $h .= (($o > 0) ? "{" . ($o + 1) . "}" : "[" . $e[$a] . "-" . $n[$a] . "]");
                        } else {
                            if ($o === 0) {
                                $h .= "[" . $e[$a] . "-" . $n[$a] . "]";
                            }
                        }
                    }
                }
            }
            $r[] = $h;
        }
        return $r;
    }

    static private function reformatArray($t) {
        $arrReturn = [];
        for ($i = 0; $i < count($t); $i++) {
            $page = count($t[$i]) / 2;
            for ($a = 0; $a < $page; $a++) {
                $arrReturn[] = array_slice($t[$i], (2 * $a), 2);
            }
        }
        return $arrReturn;
    }

    static private function parseStartRange($t, $r) {
        if (strlen($t) === strlen($r)) {
            return [[$t, $r]];
        }
        $break = pow(10, strlen($t)) - 1;
        return array_merge([[$t, $break]], self::parseStartRange($break + 1, $r));
    }

    static private function parseEndRange($t, $r) {
        if (strlen($t) == 1) {
            return [$t, $r];
        }
        if (str_repeat("0", strlen($t)) === "0" . substr($t, 1)) {
            if (str_repeat("0", strlen($r)) == "9" . substr($r, 1)) {
                return [$t, $r];
            }
            if ((int) substr($t, 0, 1) < (int) substr($r, 0, 1)) {
                $e = intval(substr($r, 0, 1) . str_repeat("0", strlen($r) - 1)) - 1;
                return array_merge([$t, self::strBreakPoint($e)], self::parseEndRange(self::strBreakPoint($e + 1), $r));
            }
        }
        if (str_repeat("9", strlen($r)) === "9" . substr($r, 1) && (int) substr($t, 0, 1) < (int) substr($r, 0, 1)) {
            $e = intval(intval((int) substr($t, 0, 1) + 1) . "" . str_repeat("0", strlen($r) - 1)) - 1;
            return array_merge(self::parseEndRange($t, self::strBreakPoint($e)), [self::strBreakPoint($e + 1), $r]);
        }
        if ((int) substr($t, 0, 1) < (int) substr($r, 0, 1)) {
            $e = intval(intval((int) substr($t, 0, 1) + 1) . "" . str_repeat("0", strlen($r) - 1)) - 1;
            return array_merge(self::parseEndRange($t, self::strBreakPoint($e)), self::parseEndRange(self::strBreakPoint($e + 1), $r));
        }
        $a = (int) substr($t, 0, 1);
        $o = self::parseEndRange(substr($t, 1), substr($r, 1));
        $h = [];
        for ($u = 0; $u < count($o); $u++) {
            $h[] = ($a . $o[$u]);
        }
        return $h;
    }

    static private function strBreakPoint($t) {
        return str_pad($t, strlen(($t + 1)), "0", STR_PAD_LEFT);
    }
}

RegexNumericRangeGenerator测试结果

2-8         ^([2-8])$
5-35        ^([5-9]|[12][0-9]|3[0-5])$
5-100       ^([5-9]|[1-8][0-9]|9[0-9]|100)$
12-1234     ^(1[2-9]|[2-9][0-9]|[1-8][0-9]{2}|9[0-8][0-9]|99[0-9]|1[01][0-9]{2}|12[0-2][0-9]|123[0-4])$
123-123     ^(123)$
256-321     ^(25[6-9]|2[6-9][0-9]|3[01][0-9]|32[01])$
256-257     ^(25[67])$
180-195     ^(18[0-9]|19[0-5])$

PHP Port of RegexNumericRangeGenerator

class RegexRangeNumberGenerator {

    static function parse($min, $max, $MatchWholeWord = FALSE, $MatchWholeLine = FALSE, $MatchLeadingZero = FALSE) {
        if (!is_int($min) || !is_int($max) || $min > $max || $min < 0 || $max < 0) {
            return FALSE;
        }
        if ($min == $max) {
            return self::parseIntoPattern($min, $MatchWholeWord, $MatchWholeLine, $MatchLeadingZero);
        }
        $s = [];
        $x = self::parseStartRange($min, $max);
        foreach ($x as $o) {
            $s[] = self::parseEndRange($o[0], $o[1]);
        }
        $n = self::reformatArray($s);
        $h = self::parseIntoRegex($n);
        return self::parseIntoPattern($h, $MatchWholeWord, $MatchWholeLine, $MatchLeadingZero);
    }

    static private function parseIntoPattern($t, $MatchWholeWord = FALSE, $MatchWholeLine = FALSE, $MatchLeadingZero = FALSE) {
        $r = ((is_array($t)) ? implode("|", $t) : $t);
        return (($MatchWholeLine && $MatchLeadingZero) ? "^0*(" . $r . ")$" : (($MatchLeadingZero) ? "0*(" . $r . ")" : (($MatchWholeLine) ? "^(" . $r . ")$" : (($MatchWholeWord) ? "\\b(" . $r . ")\\b" : "(" . $r . ")"))));
    }

    static private function parseIntoRegex($t) {
        if (!is_array($t)) {
            throw new Exception("Argument needs to be an array!");
        }
        $r = [];
        for ($i = 0; $i < count($t); $i++) {
            $e = str_split($t[$i][0]);
            $n = str_split($t[$i][1]);
            $s = "";
            $o = 0;
            $h = "";
            for ($a = 0; $a < count($e); $a++) {
                if ($e[$a] === $n[$a]) {
                    $h .= $e[$a];
                } else {
                    if ((intval($e[$a]) + 1) === intval($n[$a])) {
                        $h .= "[" . $e[$a] . $n[$a] . "]";
                    } else {
                        if ($s === ($e[$a] . $n[$a])) {
                            $o++;
                        }
                        $s = $e[$a] . $n[$a];
                        if ($a == (count($e) - 1)) {
                            $h .= (($o > 0) ? "{" . ($o + 1) . "}" : "[" . $e[$a] . "-" . $n[$a] . "]");
                        } else {
                            if ($o === 0) {
                                $h .= "[" . $e[$a] . "-" . $n[$a] . "]";
                            }
                        }
                    }
                }
            }
            $r[] = $h;
        }
        return $r;
    }

    static private function reformatArray($t) {
        $arrReturn = [];
        for ($i = 0; $i < count($t); $i++) {
            $page = count($t[$i]) / 2;
            for ($a = 0; $a < $page; $a++) {
                $arrReturn[] = array_slice($t[$i], (2 * $a), 2);
            }
        }
        return $arrReturn;
    }

    static private function parseStartRange($t, $r) {
        if (strlen($t) === strlen($r)) {
            return [[$t, $r]];
        }
        $break = pow(10, strlen($t)) - 1;
        return array_merge([[$t, $break]], self::parseStartRange($break + 1, $r));
    }

    static private function parseEndRange($t, $r) {
        if (strlen($t) == 1) {
            return [$t, $r];
        }
        if (str_repeat("0", strlen($t)) === "0" . substr($t, 1)) {
            if (str_repeat("0", strlen($r)) == "9" . substr($r, 1)) {
                return [$t, $r];
            }
            if ((int) substr($t, 0, 1) < (int) substr($r, 0, 1)) {
                $e = intval(substr($r, 0, 1) . str_repeat("0", strlen($r) - 1)) - 1;
                return array_merge([$t, self::strBreakPoint($e)], self::parseEndRange(self::strBreakPoint($e + 1), $r));
            }
        }
        if (str_repeat("9", strlen($r)) === "9" . substr($r, 1) && (int) substr($t, 0, 1) < (int) substr($r, 0, 1)) {
            $e = intval(intval((int) substr($t, 0, 1) + 1) . "" . str_repeat("0", strlen($r) - 1)) - 1;
            return array_merge(self::parseEndRange($t, self::strBreakPoint($e)), [self::strBreakPoint($e + 1), $r]);
        }
        if ((int) substr($t, 0, 1) < (int) substr($r, 0, 1)) {
            $e = intval(intval((int) substr($t, 0, 1) + 1) . "" . str_repeat("0", strlen($r) - 1)) - 1;
            return array_merge(self::parseEndRange($t, self::strBreakPoint($e)), self::parseEndRange(self::strBreakPoint($e + 1), $r));
        }
        $a = (int) substr($t, 0, 1);
        $o = self::parseEndRange(substr($t, 1), substr($r, 1));
        $h = [];
        for ($u = 0; $u < count($o); $u++) {
            $h[] = ($a . $o[$u]);
        }
        return $h;
    }

    static private function strBreakPoint($t) {
        return str_pad($t, strlen(($t + 1)), "0", STR_PAD_LEFT);
    }
}

Test Results

2-8         ^([2-8])$
5-35        ^([5-9]|[12][0-9]|3[0-5])$
5-100       ^([5-9]|[1-8][0-9]|9[0-9]|100)$
12-1234     ^(1[2-9]|[2-9][0-9]|[1-8][0-9]{2}|9[0-8][0-9]|99[0-9]|1[01][0-9]{2}|12[0-2][0-9]|123[0-4])$
123-123     ^(123)$
256-321     ^(25[6-9]|2[6-9][0-9]|3[01][0-9]|32[01])$
256-257     ^(25[67])$
180-195     ^(18[0-9]|19[0-5])$
撩动你心 2024-12-01 08:22:33

有理由必须是正则表达式吗?不能做这样的事情:

if ($number >= 256 && $number <= 321){
   // do something 
}

更新:

有一个简单但丑陋的方法可以使用 range 来做到这一点:

function getRangeRegex($from, $to)
{
    $range = implode('|', range($from, $to));

    // returns: 256|257|...|321
    return $range;
}

Is there a reason it has to be regex? can not do some thing like this:

if ($number >= 256 && $number <= 321){
   // do something 
}

Update:

There is an easy but ugly way to do it using range:

function getRangeRegex($from, $to)
{
    $range = implode('|', range($from, $to));

    // returns: 256|257|...|321
    return $range;
}
薆情海 2024-12-01 08:22:33

请小心,优秀的 @Bart Kiers 代码(以及 Travis J 的 JS 版本)在某些情况下会失败。例如:

12-1234    /^(?:1[2-9]|[2-9][0-9]|[1-9][0-9][0-9]|1[0-2][0-3][0-4])$/

不匹配“1229”、“1115”、“1[0-2][0-2][5-9]

Be careful, the excelent @Bart Kiers's code (and JS version of Travis J) in some cases it fails. For example:

12-1234    /^(?:1[2-9]|[2-9][0-9]|[1-9][0-9][0-9]|1[0-2][0-3][0-4])$/

does not match "1229", "1115", "1[0-2][0-2][5-9]"

长不大的小祸害 2024-12-01 08:22:33

这实际上已经完成了。

请查看网站。它包含一个 python 脚本的链接,该脚本会自动为您生成这些正则表达式。

That actually has been done already.

Have a look at this site. It contains a link to a python script that generates these regex's for you automagically.

月隐月明月朦胧 2024-12-01 08:22:33

continue

This answer is duplicated from this question. I've also made it into a blog post


Using regular expressions to validate a numeric range

To be clear: When a simple if statement will suffice

if(num < -2055  ||  num > 2055)  {
   throw  new IllegalArgumentException("num (" + num + ") must be between -2055 and 2055");
}

using regular expressions for validating numeric ranges is not recommended.

In addition, since regular expressions analyze strings, numbers must first be translated to a string before they can be tested (an exception is when the number happens to already be a string, such as when getting user input from the console).

(To ensure the string is a number to begin with, you could use org.apache.commons.lang3.math.NumberUtils#isNumber(s))

Despite this, figuring out how to validate number ranges with regular expressions is interesting and instructive.

A one number range

Rule: A number must be exactly 15.

The simplest range there is. A regex to match this is

\b15\b

Word boundaries are necessary to avoid matching the 15 inside of 8215242.

A two number range

The rule: The number must be between 15 and 16. Three possible regexes:

\b(15|16)\b
\b1(5|6)\b
\b1[5-6]\b

A number range "mirrored" around zero

The rule: The number must be between -12 and 12.

Here is a regex for 0 through 12, positive-only:

\b(\d|1[0-2])\b

Free-spaced:

\b(         //The beginning of a word (or number), followed by either
   \d       //   Any digit 0 through 9
|           //Or
   1[0-2]   //   A 1 followed by any digit between 0 and 2.
)\b         //The end of a word

Making this work for both negative and positive is as simple as adding an optional dash at the start:

-?\b(\d|1[0-2])\b

(This assumes no inappropriate characters precede the dash.)

To forbid negative numbers, a negative lookbehind is necessary:

(?<!-)\b(\d|1[0-2])\b

Leaving the lookbehind out would cause the 11 in -11 to match. (The first example in this post should have this added.)

Note: \d versus [0-9]

In order to be compatible with all regex flavors, all \d-s should be changed to [0-9]. For example, .NET considers non ASCII numbers, such as those in different languages, as legal values for \d. Except for in the last example, for brevity, it's left as \d.

(With thanks to TimPietzcker at stackoverflow)

Three digits, with all but the first digit equal to zero

Rule: Must be between 0 and 400.

A possible regex:

(?<!-)\b([1-3]?\d{1,2}|400)\b

Free spaced:

   (?<!-)          //Something not preceded by a dash
   \b(             //Word-start, followed by either
      [1-3]?       //   No digit, or the digit 1, 2, or 3
         \d{1,2}   //   Followed by one or two digits (between 0 and 9)
   |               //Or
      400          //   The number 400
   )\b             //Word-end

Another possibility that should never be used:

\b(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99|100|101|102|103|104|105|106|107|108|109|110|111|112|113|114|115|116|117|118|119|120|121|122|123|124|125|126|127|128|129|130|131|132|133|134|135|136|137|138|139|140|141|142|143|144|145|146|147|148|149|150|151|152|153|154|155|156|157|158|159|160|161|162|163|164|165|166|167|168|169|170|171|172|173|174|175|176|177|178|179|180|181|182|183|184|185|186|187|188|189|190|191|192|193|194|195|196|197|198|199|200|201|202|203|204|205|206|207|208|209|210|211|212|213|214|215|216|217|218|219|220|221|222|223|224|225|226|227|228|229|230|231|232|233|234|235|236|237|238|239|240|241|242|243|244|245|246|247|248|249|250|251|252|253|254|255|256|257|258|259|260|261|262|263|264|265|266|267|268|269|270|271|272|273|274|275|276|277|278|279|280|281|282|283|284|285|286|287|288|289|290|291|292|293|294|295|296|297|298|299|300|301|302|303|304|305|306|307|308|309|310|311|312|313|314|315|316|317|318|319|320|321|322|323|324|325|326|327|328|329|330|331|332|333|334|335|336|337|338|339|340|341|342|343|344|345|346|347|348|349|350|351|352|353|354|355|356|357|358|359|360|361|362|363|364|365|366|367|368|369|370|371|372|373|374|375|376|377|378|379|380|381|382|383|384|385|386|387|388|389|390|391|392|393|394|395|396|397|398|399|400)\b

Final example: Four digits, mirrored around zero, that does not end with zeros.

Rule: Must be between -2055 and 2055

This is from a question on stackoverflow.

Regex:

-?\b(20(5[0-5]|[0-4][0-9])|1?[0-9]{1,3})\b

Free-spaced:

   -?                 //Optional dash
   \b(                //Followed by word boundary, followed by either of the following
      20(             //   "20", followed by either
         5[0-5]       //      A "5" followed by a digit 0-5
      |               //   or
         [0-4][0-9]   //      A digit 0-4, followed by any digit
      )
   |                  //OR
      1?[0-9]{1,3}    //   An optional "1", followed by one through three digits (0-9)
   )\b                //Followed by a word boundary.

Here is a visual representation of this regex:


And here you can try it out yourself: Debuggex demonstration

(With thanks to PlasmaPower on stackoverflow for the debugging assistance.)

Final note

Depending on what you are capturing, it is likely that all sub-groups should be made into non-capture groups. For example, this:

(-?\b(?:20(?:5[0-5]|[0-4][0-9])|1?[0-9]{1,3})\b)

Instead of this:

-?\b(20(5[0-5]|[0-4][0-9])|1?[0-9]{1,3})\b

Example Java implementation

  import  java.util.Scanner;
  import  java.util.regex.Matcher;
  import  java.util.regex.Pattern;
  import  org.apache.commons.lang.math.NumberUtils;
/**
  <P>Confirm a user-input number is a valid number by reading a string an testing it is numeric before converting it to an it--this loops until a valid number is provided.</P>

  <P>{@code java UserInputNumInRangeWRegex}</P>
 **/
public class UserInputNumInRangeWRegex  {
   public static final void main(String[] ignored)  {

      int num = -1;
      boolean isNum = false;

      int iRangeMax = 2055;

      //"": Dummy string, to reuse matcher
      Matcher mtchrNumNegThrPos = Pattern.compile("-?\\b(20(5[0-5]|[0-4][0-9])|1?[0-9]{1,3})\\b").matcher("");

      do  {
         System.out.print("Enter a number between -" + iRangeMax + " and " + iRangeMax + ": ");
         String strInput = (new Scanner(System.in)).next();
         if(!NumberUtils.isNumber(strInput))  {
            System.out.println("Not a number. Try again.");
         }  else if(!mtchrNumNegThrPos.reset(strInput).matches())  {
            System.out.println("Not in range. Try again.");
         }  else  {
            //Safe to convert
            num = Integer.parseInt(strInput);
            isNum = true;
         }
      }  while(!isNum);

      System.out.println("Number: " + num);
   }
}

Output

[C:\java_code\]java UserInputNumInRangeWRegex
Enter a number between -2055 and 2055: tuhet
Not a number. Try again.
Enter a number between -2055 and 2055: 283837483
Not in range. Try again.
Enter a number between -2055 and 2055: -200000
Not in range. Try again.
Enter a number between -2055 and 2055: -300
Number: -300
相权↑美人 2024-12-01 08:22:33

我已将 Bart Kiers 的答案转换为 C++。该函数将两个整数作为输入并生成数字范围的正则表达式。

#include <stdio.h>
#include <iostream>
#include <vector>
#include <string>

std::string regex_range(int from, int to);

int main(int argc, char **argv)
{
    std::string regex = regex_range(1,100);

    std::cout << regex << std::endl;

    return 0;
}

std::string regex_range(int from, int to) //Credit: Bart Kiers 2011
{
    if(from < 0 || to < 0)
    {
        std::cout << "Negative values not supported. Exiting." << std::endl;
        return 0;
    }

    if(from > to)
    {
        std::cout << "Invalid range, from > to. Exiting." << std::endl;
        return 0;
    }

    std::vector<int> ranges;
    ranges.push_back(from);
    int increment = 1;
    int next = from;
    bool higher = true;

    while(true)
    {

        next += increment;

        if(next + increment > to)
        {
            if(next <= to)
            {
                ranges.push_back(next);
            }
            increment /= 10;
            higher = false;
        }
        else if(next % (increment*10) == 0)
        {
            ranges.push_back(next);
            increment = higher ? increment*10 : increment/10;
        }

        if(!higher && (increment < 10))
        {
            break;
        }
    }

    ranges.push_back(to + 1);
    std::string regex("^(?:");

    for(int i = 0; i < ranges.size() - 1; i++)
    {
        int current_from = ranges.at(i);
        std::string str_from = std::to_string(current_from);
        int current_to = ranges.at(i + 1) - 1;
        std::string str_to = std::to_string(current_to);
        for(int j = 0; j < str_from.length(); j++)
        {
            if(str_from.at(j) == str_to.at(j))
            {
                std::string str_from_at_j(&str_from.at(j));
                regex.append(str_from_at_j);
            }
            else
            {
                std::string str_from_at_j(&str_from.at(j));
                std::string str_to_at_j(&str_to.at(j));

                regex.append("[");
                regex.append(str_from_at_j);
                regex.append("-");
                regex.append(str_to_at_j);
                regex.append("]");
            }
        }
        regex.append("|");
    }
    regex = regex.substr(0, regex.length() - 1);
    regex.append(")$");
    return regex;
}

I've converted Bart Kiers's answer into C++. The function takes two integers as an input and generates the regular expression for the number range.

#include <stdio.h>
#include <iostream>
#include <vector>
#include <string>

std::string regex_range(int from, int to);

int main(int argc, char **argv)
{
    std::string regex = regex_range(1,100);

    std::cout << regex << std::endl;

    return 0;
}

std::string regex_range(int from, int to) //Credit: Bart Kiers 2011
{
    if(from < 0 || to < 0)
    {
        std::cout << "Negative values not supported. Exiting." << std::endl;
        return 0;
    }

    if(from > to)
    {
        std::cout << "Invalid range, from > to. Exiting." << std::endl;
        return 0;
    }

    std::vector<int> ranges;
    ranges.push_back(from);
    int increment = 1;
    int next = from;
    bool higher = true;

    while(true)
    {

        next += increment;

        if(next + increment > to)
        {
            if(next <= to)
            {
                ranges.push_back(next);
            }
            increment /= 10;
            higher = false;
        }
        else if(next % (increment*10) == 0)
        {
            ranges.push_back(next);
            increment = higher ? increment*10 : increment/10;
        }

        if(!higher && (increment < 10))
        {
            break;
        }
    }

    ranges.push_back(to + 1);
    std::string regex("^(?:");

    for(int i = 0; i < ranges.size() - 1; i++)
    {
        int current_from = ranges.at(i);
        std::string str_from = std::to_string(current_from);
        int current_to = ranges.at(i + 1) - 1;
        std::string str_to = std::to_string(current_to);
        for(int j = 0; j < str_from.length(); j++)
        {
            if(str_from.at(j) == str_to.at(j))
            {
                std::string str_from_at_j(&str_from.at(j));
                regex.append(str_from_at_j);
            }
            else
            {
                std::string str_from_at_j(&str_from.at(j));
                std::string str_to_at_j(&str_to.at(j));

                regex.append("[");
                regex.append(str_from_at_j);
                regex.append("-");
                regex.append(str_to_at_j);
                regex.append("]");
            }
        }
        regex.append("|");
    }
    regex = regex.substr(0, regex.length() - 1);
    regex.append(")$");
    return regex;
}
感性不性感 2024-12-01 08:22:33

因为我遇到同样的问题@EmilianoT 已经 报告,我尝试修复它,但最终我选择移植 rel="nofollow noreferrer">RegexNumericRangeGenerator (由 @EmilianoT 移植),尽管不在一个类中。我对这个 JS 端口不太满意,因为所有 toString()parseInt() 方法仍然可以优化(它们可能在不必要的地方),但它适用于所有情况。

我改变的是参数。我用 parse(min, max, width = 0, prefix = '' 替换了 parse($min, $max, $MatchWholeWord = FALSE, $MatchWholeLine = FALSE, $MatchLeadingZero = FALSE) , suffix = ''),这给了它更多选项(有些人可能希望将正则表达式放入斜杠中,其他人希望匹配该行[前缀 = '^'; 后缀 = '$'] 等)。我还希望能够配置数字的宽度(width = 3000001052 >、8001000、...)。

我替换了之前的答案,因为它并不总是有效。如果有人想阅读它,他们可以在答案历史记录中看到它。

function parse(min, max, width = 0, prefix = '', suffix = '') {
  if (! Number.isInteger(min) || ! Number.isInteger(max) || min > max || min < 0 || max < 0) {
    return false
  }

  if (min == max) {
    return parseIntoPattern(min, prefix, suffix)
  }

  let x = parseStartRange(min, max)
  let s = []

  x.forEach(o => {
    s.push(parseEndRange(o[0], o[1]))
  })

  let n = reformatArray(s)
  let h = parseIntoRegex(n, width)

  return parseIntoPattern(h, prefix, suffix)
}

function parseIntoPattern(t, prefix = '', suffix = '') {
  let r = Array.isArray(t) ? t.join('|') : t
  return prefix + '(' + r + ')' + suffix
}

function parseIntoRegex(t, width = 0) {
  if (! Array.isArray(t)) {
    throw new Error('Argument needs to be an array!')
  }

  let r = []

  for (let i = 0; i < t.length; i++) {
    let e = t[i][0].split('')
    let n = t[i][1].split('')
    let s = ''
    let o = 0
    let h = ''

    for (let a = 0; a < e.length; a++) {
      if (e[a] === n[a]) {
        h += e[a]
      } else if (parseInt(e[a]) + 1 === parseInt(n[a])) {
        h += '[' + e[a] + n[a] + ']'
      } else {
        if (s === e[a] + n[a]) {
          o++
        }

        s = e[a] + n[a]

        if (a == e.length - 1) {
          h += o > 0 ? '{' + (o + 1) + '}' : '[' + e[a] + '-' + n[a] + ']'
        } else if (o === 0) {
          h += '[' + e[a] + '-' + n[a] + ']'
        }
      }
    }

    if (e.length < width) {
      h = '0'.repeat(width - e.length, '0') + h
    }

    r.push(h)
  }

  return r
}

function reformatArray(t) {
  let arrReturn = []

  for (let i = 0; i < t.length; i++) {
    let page = t[i].length / 2

    for (let a = 0; a < page; a++) {
      arrReturn.push(t[i].slice(2 * a))
    }
  }

  return arrReturn
}

function parseStartRange(t, r) {
  t = t.toString()
  r = r.toString()

  if (t.length === r.length) {
    return [[t, r]]
  }

  let breakOut = 10 ** t.length - 1
  return [[t, breakOut.toString()]].concat(parseStartRange(breakOut + 1, r))
}

function parseEndRange(t, r) {
  if (t.length == 1) {
    return [t, r]
  }

  if ('0'.repeat(t.length) === '0' + t.substr(1)) {
    if ('0'.repeat(r.length) == '9' + r.substr(1)) {
      return [t, r]
    }

    if (parseInt(t.toString().substr(0, 1)) < parseInt(r.toString().substr(0, 1))) {
      let e = parseInt(r.toString().substr(0, 1) + '0'.repeat(r.length - 1)) - 1
      return [t, strBreakPoint(e)].concat(parseEndRange(strBreakPoint(e + 1), r))
    }
  }

  if ('9'.repeat(r.length) === '9' + r.toString().substr(1) && parseInt(t.toString().substr(0, 1)) < parseInt(r.toString().substr(0, 1))) {
    let e = parseInt(parseInt(parseInt(t.toString().substr(0, 1)) + 1) + '0'.repeat(r.length - 1)) - 1
    return parseEndRange(t, strBreakPoint(e)).concat(strBreakPoint(e + 1), r)
  }

  if (parseInt(t.toString().substr(0, 1)) < parseInt(r.toString().substr(0, 1))) {
    let e = parseInt(parseInt(parseInt(t.toString().substr(0, 1)) + 1) + '0'.repeat(r.length - 1)) - 1
    return parseEndRange(t, strBreakPoint(e)).concat(parseEndRange(strBreakPoint(e + 1), r))
  }

  let a = parseInt(t.toString().substr(0, 1))
  let o = parseEndRange(t.toString().substr(1), r.toString().substr(1))
  let h = []

  for (let u = 0; u < o.length; u++) {
    h.push(a + o[u])
  }

  return h
}

function strBreakPoint(t) {
  return t.toString().padStart((parseInt(t) + 1).toString().length, '0')
}

As I have encountered the same issue as @EmilianoT already reported, I tried to fix it, but in the end I opted for porting the PHP port of RegexNumericRangeGenerator (ported by @EmilianoT), although not in a class. I am not quite happy with this JS port, as all toString() and parseInt() methods could be still optimised (they might be somewhere unnecessary), but it works for all cases.

I thing I changed are the parameters. I replaced parse($min, $max, $MatchWholeWord = FALSE, $MatchWholeLine = FALSE, $MatchLeadingZero = FALSE) with parse(min, max, width = 0, prefix = '', suffix = ''), which gives it more options (some might want to put the regex into slashes, others want to match the line [prefix = '^'; suffix = '$'], etc). Also I wanted to be able to configure the width of the number (width = 3000, 001, 052, 800, 1000, ...).

I replaced my previous answer, as it does not work all the time. If one wants to read it, they can see it in the answer history.

function parse(min, max, width = 0, prefix = '', suffix = '') {
  if (! Number.isInteger(min) || ! Number.isInteger(max) || min > max || min < 0 || max < 0) {
    return false
  }

  if (min == max) {
    return parseIntoPattern(min, prefix, suffix)
  }

  let x = parseStartRange(min, max)
  let s = []

  x.forEach(o => {
    s.push(parseEndRange(o[0], o[1]))
  })

  let n = reformatArray(s)
  let h = parseIntoRegex(n, width)

  return parseIntoPattern(h, prefix, suffix)
}

function parseIntoPattern(t, prefix = '', suffix = '') {
  let r = Array.isArray(t) ? t.join('|') : t
  return prefix + '(' + r + ')' + suffix
}

function parseIntoRegex(t, width = 0) {
  if (! Array.isArray(t)) {
    throw new Error('Argument needs to be an array!')
  }

  let r = []

  for (let i = 0; i < t.length; i++) {
    let e = t[i][0].split('')
    let n = t[i][1].split('')
    let s = ''
    let o = 0
    let h = ''

    for (let a = 0; a < e.length; a++) {
      if (e[a] === n[a]) {
        h += e[a]
      } else if (parseInt(e[a]) + 1 === parseInt(n[a])) {
        h += '[' + e[a] + n[a] + ']'
      } else {
        if (s === e[a] + n[a]) {
          o++
        }

        s = e[a] + n[a]

        if (a == e.length - 1) {
          h += o > 0 ? '{' + (o + 1) + '}' : '[' + e[a] + '-' + n[a] + ']'
        } else if (o === 0) {
          h += '[' + e[a] + '-' + n[a] + ']'
        }
      }
    }

    if (e.length < width) {
      h = '0'.repeat(width - e.length, '0') + h
    }

    r.push(h)
  }

  return r
}

function reformatArray(t) {
  let arrReturn = []

  for (let i = 0; i < t.length; i++) {
    let page = t[i].length / 2

    for (let a = 0; a < page; a++) {
      arrReturn.push(t[i].slice(2 * a))
    }
  }

  return arrReturn
}

function parseStartRange(t, r) {
  t = t.toString()
  r = r.toString()

  if (t.length === r.length) {
    return [[t, r]]
  }

  let breakOut = 10 ** t.length - 1
  return [[t, breakOut.toString()]].concat(parseStartRange(breakOut + 1, r))
}

function parseEndRange(t, r) {
  if (t.length == 1) {
    return [t, r]
  }

  if ('0'.repeat(t.length) === '0' + t.substr(1)) {
    if ('0'.repeat(r.length) == '9' + r.substr(1)) {
      return [t, r]
    }

    if (parseInt(t.toString().substr(0, 1)) < parseInt(r.toString().substr(0, 1))) {
      let e = parseInt(r.toString().substr(0, 1) + '0'.repeat(r.length - 1)) - 1
      return [t, strBreakPoint(e)].concat(parseEndRange(strBreakPoint(e + 1), r))
    }
  }

  if ('9'.repeat(r.length) === '9' + r.toString().substr(1) && parseInt(t.toString().substr(0, 1)) < parseInt(r.toString().substr(0, 1))) {
    let e = parseInt(parseInt(parseInt(t.toString().substr(0, 1)) + 1) + '0'.repeat(r.length - 1)) - 1
    return parseEndRange(t, strBreakPoint(e)).concat(strBreakPoint(e + 1), r)
  }

  if (parseInt(t.toString().substr(0, 1)) < parseInt(r.toString().substr(0, 1))) {
    let e = parseInt(parseInt(parseInt(t.toString().substr(0, 1)) + 1) + '0'.repeat(r.length - 1)) - 1
    return parseEndRange(t, strBreakPoint(e)).concat(parseEndRange(strBreakPoint(e + 1), r))
  }

  let a = parseInt(t.toString().substr(0, 1))
  let o = parseEndRange(t.toString().substr(1), r.toString().substr(1))
  let h = []

  for (let u = 0; u < o.length; u++) {
    h.push(a + o[u])
  }

  return h
}

function strBreakPoint(t) {
  return t.toString().padStart((parseInt(t) + 1).toString().length, '0')
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文