从数字范围生成正则表达式

发布于 2024-09-30 12:52:35 字数 512 浏览 0 评论 0原文

我想从数字范围生成一个(系列)正则表达式。

示例:

1013 - 4044 => 

regexp                      matches
---------------------------------------
101[3-9]                    1013 - 1019
10[2-9][0-9]                1020 - 1099
11[0-9][0-9]                1100 - 1199
[23][0-9][0-9][0-9]         2000 - 3999
40[0-3][0-9]                4000 - 4039
404[0-4]                    4040 - 4044

最简单的算法是什么?

扭转它的最简单方法是什么(即给定正则表达式,寻找范围)?

很高兴看到 java、clojure、perl 中的解决方案...

谢谢!

I'd like to generate a (series of) regexp(s) from a numeric range.

Example:

1013 - 4044 => 

regexp                      matches
---------------------------------------
101[3-9]                    1013 - 1019
10[2-9][0-9]                1020 - 1099
11[0-9][0-9]                1100 - 1199
[23][0-9][0-9][0-9]         2000 - 3999
40[0-3][0-9]                4000 - 4039
404[0-4]                    4040 - 4044

what is the simplest algorithm?

What is the easiest way to reverse it (i.e. given the regexps, looking for the ranges)?

Would be nice to see solutions in java, clojure, perl...

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

ぺ禁宫浮华殁 2024-10-07 12:52:35

有一个在线工具用于在给定范围内生成正则表达式,并提供解释。您也可以在那里找到源代码。例如:

^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
First, break into equal length ranges:
  1013 - 4044

Second, break into ranges that yield simple regexes:
  1013 - 1019
  1020 - 1099
  1100 - 1999
  2000 - 3999
  4000 - 4039
  4040 - 4044

Turn each range into a regex:
  101[3-9]
  10[2-9][0-9]
  1[1-9][0-9]{2}
  [23][0-9]{3}
  40[0-3][0-9]
  404[0-4]

Collapse adjacent powers of 10:
  101[3-9]
  10[2-9][0-9]
  1[1-9][0-9]{2}
  [23][0-9]{3}
  40[0-3][0-9]
  404[0-4]

Combining the regexes above yields:
  (101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])

Next we'll try factoring out common prefixes using a tree:
Parse into tree based on regex prefixes:
  . 1 0 1 [3-9]
      + [2-9] [0-9]
    + [1-9] [0-9]{2}
  + [23] [0-9]{3}
  + 4 0 [0-3] [0-9]
      + 4 [0-4]

Turning the parse tree into a regex yields:
  (1(0(1[3-9]|[2-9][0-9])|[1-9][0-9]{2})|[23][0-9]{3}|40([0-3][0-9]|4[0-4]))

We choose the shorter one as our result.

^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$

要反转它,您可以查看字符类,并获取每个替代项的最小值和最大值。

   ^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
=>   1013     1020         1100            2000        4000         4040     lowers
        1019         1999        1199         3999            4039     4044  uppers

=> 1013 - 4044

There is an online tool for generating regex given a range, and provides the explanation. You can find the source code there also. For example:

^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
First, break into equal length ranges:
  1013 - 4044

Second, break into ranges that yield simple regexes:
  1013 - 1019
  1020 - 1099
  1100 - 1999
  2000 - 3999
  4000 - 4039
  4040 - 4044

Turn each range into a regex:
  101[3-9]
  10[2-9][0-9]
  1[1-9][0-9]{2}
  [23][0-9]{3}
  40[0-3][0-9]
  404[0-4]

Collapse adjacent powers of 10:
  101[3-9]
  10[2-9][0-9]
  1[1-9][0-9]{2}
  [23][0-9]{3}
  40[0-3][0-9]
  404[0-4]

Combining the regexes above yields:
  (101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])

Next we'll try factoring out common prefixes using a tree:
Parse into tree based on regex prefixes:
  . 1 0 1 [3-9]
      + [2-9] [0-9]
    + [1-9] [0-9]{2}
  + [23] [0-9]{3}
  + 4 0 [0-3] [0-9]
      + 4 [0-4]

Turning the parse tree into a regex yields:
  (1(0(1[3-9]|[2-9][0-9])|[1-9][0-9]{2})|[23][0-9]{3}|40([0-3][0-9]|4[0-4]))

We choose the shorter one as our result.

^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$

To reverse it, you can look at the character classes, and get the minimum and maximum for each alternative.

   ^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
=>   1013     1020         1100            2000        4000         4040     lowers
        1019         1999        1199         3999            4039     4044  uppers

=> 1013 - 4044
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文