如何匹配正则表达式中的数值？

发布于 2024-09-12 18:23:02 字数 506 浏览 11 评论 0原文

好吧，这是我自己遇到的一个非常有趣的挑战。

我的 RegEx 采用如下所示的输入行：

147.63.23.156/159
94.182.23.55/56
134.56.33.11/12

我需要它输出与所表示的范围匹配的正则表达式。让我解释一下。

例如，如果正则表达式接收到 147.63.23.156/159，那么它需要输出与以下内容匹配的正则表达式：

147.63.23.156
147.63.23.157
147.63.23.158
147.63.23.159

我该怎么做？

目前我有：

(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})

$1 包含第一个 xxx.xxx.xxx. 部分
$2 包含数字的下限
$3 包含数字的上限

原文

Okay, this is quite an interesting challenge I have got myself into.

My RegEx takes as input lines like the following:

147.63.23.156/159
94.182.23.55/56
134.56.33.11/12

I need it to output a regular expression that matches the range represented. Let me explain.

For example, if the RegEx receives 147.63.23.156/159, then it needs to output a RegEx that matches the following:

147.63.23.156
147.63.23.157
147.63.23.158
147.63.23.159

How can I do this?

Currently I have:

(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})

$1 contains the first xxx.xxx.xxx. part
$2 contains the lower range for the number
$3 contains the upper range for the number

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

久而酒知 2024-09-19 18:23:02

正则表达式确实不是验证 IP 地址的好方法，我想提前澄清这一点。解析地址并进行一些简单的算术来比较它们要容易得多。几个小于和大于就可以了。

也就是说，编写正则表达式生成器似乎是一个有趣的练习。我想出了一大堆 Python 代码来生成这些正则表达式。在展示代码之前，先看一下它为几个 IP 范围生成的正则表达式示例：

1.2.3.4 to 1.2.3.4              1\.2\.3\.4
147.63.23.156 to 147.63.23.159  147\.63\.23\.15[6-9]
10.7.7.10 to 10.7.7.88          10\.7\.7\.([1-7]\d|8[0-8])
127.0.0.0 to 127.0.1.255        127\.0\.[0-1]\.(\d|[1-9]\d|1\d\d|2([0-4]\d|5[0-5]))

我将分两部分展示代码。首先，为简单整数范围生成正则表达式的部分。其次，处理完整IP地址的部分。

匹配数字范围

第一步是弄清楚如何生成匹配任意整数范围（例如 12-28 或 0-255）的正则表达式。这是我的实现中出现的正则表达式的示例：

156 to 159   15[6-9]
1 to 100     [1-9]|[1-9]\d|100
0 to 255     \d|[1-9]\d|1\d\d|2([0-4]\d|5[0-5])

现在是代码。有许多内联评论解释其背后的逻辑。总的来说，它依赖于大量的递归和特殊的大小写来尝试保持正则表达式的简洁和平均。

import sys, re

def range_regex(lower, upper):
    lower, upper = str(lower), str(upper)

    # Different lengths, for instance 1-100. Combine regex(1-9) and
    # regex(10-100).
    if len(lower) != len(upper):
        return '%s|%s' % (
            range_regex(lower, '9' * len(lower)),
            range_regex(10 ** (len(lower)), upper)
        )

    ll, lr = lower[0], lower[1:]
    ul, ur = upper[0], upper[1:]

    # One digit numbers.
    if lr == '':
        if ll == '0' and ul == '9':
            return '\\d'
        else:
            return '[%s-%s]' % (ll, ul)

    # Same first digit, for instance 12-14. Concatenate "1" and regex(2-4).
    elif ll == ul:
        return ll + sub_range_regex(lr, ur)

    # All zeros to all nines, for instance 100-399. Concatenate regex(1-3)
    # and the appropriate number of \d's.
    elif lr == '0' * len(lr) and ur == '9' * len(ur):
        return range_regex(ll, ul) + '\\d' * len(lr)

    # All zeros on left, for instance 200-649. Combine regex(200-599) and
    # regex(600-649).
    elif lr == '0' * len(lr):
        return '%s|%s' % (
            range_regex(lower, str(int(ul[0]) - 1) + '9' * len(ur)),
            range_regex(ul + '0' * len(ur), upper)
        )

    # All nines on right, for instance 167-499. Combine regex(167-199) and
    # regex(200-499).
    elif ur == '9' * len(ur):
        return '%s|%s' % (
            range_regex(lower, ll + '9' * len(lr)),
            range_regex(str(int(ll[0]) + 1) + '0' * len(lr), upper)
        )

    # First digits are one apart, for instance 12-24. Combine regex(12-19)
    # and regex(20-24).
    elif ord(ul[0]) - ord(ll[0]) == 1:
        return '%s%s|%s%s' % (
            ll, sub_range_regex(lr, '9' * len(lr)),
            ul, sub_range_regex('0' * len(ur), ur)
        )

    # Far apart, uneven numbers, for instance 15-73. Combine regex(15-19),
    # regex(20-69), and regex(70-73).
    else:
        return '%s|%s|%s' % (
            range_regex(lower, ll + '9' * len(lr)),
            range_regex(str(int(ll[0]) + 1) + '0' * len(lr),
                        str(int(ul[0]) - 1) + '9' * len(ur)),
            range_regex(ul + '0' * len(ur), upper)
        )

# Helper function which adds parentheses when needed to sub-regexes.
# Sub-regexes need parentheses if they have pipes that aren't already
# contained within parentheses. For example, "6|8" needs parentheses
# but "1(6|8)" doesn't.
def sub_range_regex(lower, upper):
    orig_regex = range_regex(lower, upper)
    old_regex  = orig_regex

    while True:
        new_regex = re.sub(r'\([^()]*\)', '', old_regex)

        if new_regex == old_regex:
            break
        else:
            old_regex = new_regex
            continue

    if '|' in new_regex:
        return '(' + orig_regex + ')'
    else:
        return orig_regex

匹配 IP 地址范围

有了该功能，我编写了一个非常相似的 IP 范围函数来处理完整的 IP 地址。该代码与上面的代码非常相似，只是我们使用基数 256 而不是基数 10，并且代码抛出的是列表而不是字符串。

import sys, re, socket

def ip_range_regex(lower, upper):
    lower = [ord(c) for c in socket.inet_aton(lower)]
    upper = [ord(c) for c in socket.inet_aton(upper)]

    return ip_array_regex(lower, upper)

def ip_array_regex(lower, upper):
    # One octet left.
    if len(lower) == 1:
        return range_regex(lower[0], upper[0])

    # Same first octet.
    if lower[0] == upper[0]:
        return '%s\.%s' % (lower[0], sub_regex(ip_array_regex(lower[1:], upper[1:])))

    # Full subnet.
    elif lower[1:] == [0] * len(lower[1:]) and upper[1:] == [255] * len(upper[1:]):
        return '%s\.%s' % (
            range_regex(lower[0], upper[0]),
            sub_regex(ip_array_regex(lower[1:], upper[1:]))
        )

    # Partial lower subnet.
    elif lower[1:] == [0] * len(lower[1:]):
        return '%s|%s' % (
            ip_array_regex(lower, [upper[0] - 1] + [255] * len(upper[1:])),
            ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
        )

    # Partial upper subnet.
    elif upper[1:] == [255] * len(upper[1:]):
        return '%s|%s' % (
            ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
            ip_array_regex([lower[0] + 1] + [0] * len(lower[1:]), upper)
        )

    # First octets just 1 apart.
    elif upper[0] - lower[0] == 1:
        return '%s|%s' % (
            ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
            ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
        )

    # First octets more than 1 apart.
    else:
        return '%s|%s|%s' % (
            ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
            ip_array_regex([lower[0] + 1] + [0]   * len(lower[1:]),
                           [upper[0] - 1] + [255] * len(upper[1:])),
            ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
        )

Regexes are really not a great way to validate IP addresses, I want to make that clear right up front. It is far, far easier to parse the addresses and do some simple arithmetic to compare them. A couple of less thans and greater thans and you're there.

That said, it seemed like it would be a fun exercise to write a regex generator. I came up with a big mess of Python code to generate these regexes. Before I show the code, here's a sample of the regexes it produces for a couple of IP ranges:

1.2.3.4 to 1.2.3.4              1\.2\.3\.4
147.63.23.156 to 147.63.23.159  147\.63\.23\.15[6-9]
10.7.7.10 to 10.7.7.88          10\.7\.7\.([1-7]\d|8[0-8])
127.0.0.0 to 127.0.1.255        127\.0\.[0-1]\.(\d|[1-9]\d|1\d\d|2([0-4]\d|5[0-5]))

I'll show the code in two parts. First, the part that generates regexes for simple integer ranges. Second, the part that handles full IP addresses.

Matching number ranges

The first step is to figure out how to generate a regex that matches an arbitrary integer range, say 12-28 or 0-255. Here's an example of the regexes my implementation comes up with:

156 to 159   15[6-9]
1 to 100     [1-9]|[1-9]\d|100
0 to 255     \d|[1-9]\d|1\d\d|2([0-4]\d|5[0-5])

And now the code. There are numerous comments inline explaining the logic behind it. Overall it relies on a lot of recursion and special casing to try to keep the regexes lean and mean.

import sys, re

def range_regex(lower, upper):
    lower, upper = str(lower), str(upper)

    # Different lengths, for instance 1-100. Combine regex(1-9) and
    # regex(10-100).
    if len(lower) != len(upper):
        return '%s|%s' % (
            range_regex(lower, '9' * len(lower)),
            range_regex(10 ** (len(lower)), upper)
        )

    ll, lr = lower[0], lower[1:]
    ul, ur = upper[0], upper[1:]

    # One digit numbers.
    if lr == '':
        if ll == '0' and ul == '9':
            return '\\d'
        else:
            return '[%s-%s]' % (ll, ul)

    # Same first digit, for instance 12-14. Concatenate "1" and regex(2-4).
    elif ll == ul:
        return ll + sub_range_regex(lr, ur)

    # All zeros to all nines, for instance 100-399. Concatenate regex(1-3)
    # and the appropriate number of \d's.
    elif lr == '0' * len(lr) and ur == '9' * len(ur):
        return range_regex(ll, ul) + '\\d' * len(lr)

    # All zeros on left, for instance 200-649. Combine regex(200-599) and
    # regex(600-649).
    elif lr == '0' * len(lr):
        return '%s|%s' % (
            range_regex(lower, str(int(ul[0]) - 1) + '9' * len(ur)),
            range_regex(ul + '0' * len(ur), upper)
        )

    # All nines on right, for instance 167-499. Combine regex(167-199) and
    # regex(200-499).
    elif ur == '9' * len(ur):
        return '%s|%s' % (
            range_regex(lower, ll + '9' * len(lr)),
            range_regex(str(int(ll[0]) + 1) + '0' * len(lr), upper)
        )

    # First digits are one apart, for instance 12-24. Combine regex(12-19)
    # and regex(20-24).
    elif ord(ul[0]) - ord(ll[0]) == 1:
        return '%s%s|%s%s' % (
            ll, sub_range_regex(lr, '9' * len(lr)),
            ul, sub_range_regex('0' * len(ur), ur)
        )

    # Far apart, uneven numbers, for instance 15-73. Combine regex(15-19),
    # regex(20-69), and regex(70-73).
    else:
        return '%s|%s|%s' % (
            range_regex(lower, ll + '9' * len(lr)),
            range_regex(str(int(ll[0]) + 1) + '0' * len(lr),
                        str(int(ul[0]) - 1) + '9' * len(ur)),
            range_regex(ul + '0' * len(ur), upper)
        )

# Helper function which adds parentheses when needed to sub-regexes.
# Sub-regexes need parentheses if they have pipes that aren't already
# contained within parentheses. For example, "6|8" needs parentheses
# but "1(6|8)" doesn't.
def sub_range_regex(lower, upper):
    orig_regex = range_regex(lower, upper)
    old_regex  = orig_regex

    while True:
        new_regex = re.sub(r'\([^()]*\)', '', old_regex)

        if new_regex == old_regex:
            break
        else:
            old_regex = new_regex
            continue

    if '|' in new_regex:
        return '(' + orig_regex + ')'
    else:
        return orig_regex

Matching IP address ranges

With that capability in place, I then wrote a very similar-looking IP range function to work with full IP addresses. The code is very similar to the code above except that we're working in base 256 instead of base 10, and the code throws around lists instead of strings.

import sys, re, socket

def ip_range_regex(lower, upper):
    lower = [ord(c) for c in socket.inet_aton(lower)]
    upper = [ord(c) for c in socket.inet_aton(upper)]

    return ip_array_regex(lower, upper)

def ip_array_regex(lower, upper):
    # One octet left.
    if len(lower) == 1:
        return range_regex(lower[0], upper[0])

    # Same first octet.
    if lower[0] == upper[0]:
        return '%s\.%s' % (lower[0], sub_regex(ip_array_regex(lower[1:], upper[1:])))

    # Full subnet.
    elif lower[1:] == [0] * len(lower[1:]) and upper[1:] == [255] * len(upper[1:]):
        return '%s\.%s' % (
            range_regex(lower[0], upper[0]),
            sub_regex(ip_array_regex(lower[1:], upper[1:]))
        )

    # Partial lower subnet.
    elif lower[1:] == [0] * len(lower[1:]):
        return '%s|%s' % (
            ip_array_regex(lower, [upper[0] - 1] + [255] * len(upper[1:])),
            ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
        )

    # Partial upper subnet.
    elif upper[1:] == [255] * len(upper[1:]):
        return '%s|%s' % (
            ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
            ip_array_regex([lower[0] + 1] + [0] * len(lower[1:]), upper)
        )

    # First octets just 1 apart.
    elif upper[0] - lower[0] == 1:
        return '%s|%s' % (
            ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
            ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
        )

    # First octets more than 1 apart.
    else:
        return '%s|%s|%s' % (
            ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
            ip_array_regex([lower[0] + 1] + [0]   * len(lower[1:]),
                           [upper[0] - 1] + [255] * len(upper[1:])),
            ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
        )

回复收藏 0 原文

梦中楼上月下 2024-09-19 18:23:02

如果您只需要一次构建一个，这个网站就可以解决问题。

如果您需要代码，并且不介意 python，此代码对任何任意数字范围执行此操作。

回复收藏 0 原文

想你的星星会说话 2024-09-19 18:23:02

如果它是针对 Apache 的...我还没有尝试过，但它可能会工作：（

RewriteCond %{REMOTE_ADDR} !<147.63.23.156
RewriteCond %{REMOTE_ADDR} !>147.63.23.159

两个连续的 RewriteCond 由默认逻辑 AND 连接）

只需小心不同位数的范围（例如，95-105 应分为 95-99 和 100-105，因为它是字典顺序）。

If it's for Apache... I haven't tried it, but it might work:

RewriteCond %{REMOTE_ADDR} !<147.63.23.156
RewriteCond %{REMOTE_ADDR} !>147.63.23.159

(Two consecutive RewriteConds are joined by a default logical AND)

Just have to be careful with ranges with differing number of digits (e.g. 95-105 should be broken into 95-99 and 100-105, since it is lexicographic ordering).

回复收藏 0 原文

戏剧牡丹亭 2024-09-19 18:23:02

我完全同意评论者的观点，纯正则表达式解决方案对于这里的工作来说是错误的工具。只需使用您已有的正则表达式提取前缀、最小值和最大值，

$prefix, $minimum, $maximum = match('(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})', $line).groups()

然后根据 ${prefix}(\d+) 测试您的 IP 地址，

$lastgroup = match($prefix + '(\d+)', $addr).groups()[0]

并比较最后一组以查看是否符合在适当的范围内，

return int($minimum) <= int($lastgroup) <= int($maximum)

代码示例当然是伪代码 - 转换为您选择的语言。

I absolutely agree with the commenters, a pure-regex solution would be the wrong tool for the job here. Just use the regular expression you already have to extract the prefix, minimum, and maximum values,

$prefix, $minimum, $maximum = match('(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})', $line).groups()

then test your IP address against ${prefix}(\d+),

$lastgroup = match($prefix + '(\d+)', $addr).groups()[0]

and compare that last group to see if it falls within the proper range,

return int($minimum) <= int($lastgroup) <= int($maximum)

Code examples are pseudocode, of course - convert to your language of choice.

回复收藏 0 原文

溺深海 2024-09-19 18:23:02

据我所知，这不能用直接的正则表达式来完成，但还需要一些背后的代码。例如，在 PHP 中，您可以使用以下内容：

function make_range($ip){
    $regex = '#(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})#';
    if ( preg_match($regex, $ip, $matches) ){
        while($matches[1] <= $matches[2]){
            print "{$matches[0]}.{$matches[1]}";
            $matches[1]++;
        }
    } else {
        exit('not a supported IP range');
    } 
}

为了与 RewriteCond 一起使用，我认为需要一些黑魔法......

无论如何，这将如何与 RewriteCond 一起使用？您是否拥有多台服务器并且想要轻松快速地制作 .htaccess 文件？如果是这样，那么只需将该函数添加到一个更大的脚本中，该脚本需要一些参数并生成 .htaccess 文件。

To my knowledge, this can't be done with straight up regex, but would also need some code behind it. For instance, in PHP you could use the following:

function make_range($ip){
    $regex = '#(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})#';
    if ( preg_match($regex, $ip, $matches) ){
        while($matches[1] <= $matches[2]){
            print "{$matches[0]}.{$matches[1]}";
            $matches[1]++;
        }
    } else {
        exit('not a supported IP range');
    } 
}

For this to work with a RewriteCond, I think some black magic would be in order...

How is this going to be used with RewriteCond, anyways? Do you have several servers and want to just quickly make a .htaccess file easily? If so, then just add that function to a bigger script that takes some arguments and burps out a .htaccess file.

回复收藏 0 原文

~没有更多了~