当前位置：文江博客话题详情

正则表达式匹配不可约分数

发布于 2024-11-01 09:30:03 字数 652 浏览 1 评论 0 原文

如何将不可约分数与正则表达式匹配？

例如，23/25、3/4、5/2、100/101等。

首先，我对正则表达式中的gcd算法实现一无所知。

更新对于所有回答“你使用了错误的工具”的人：

是的，伙计们，我意识到正则表达式通常用于什么。没关系。但这个问题很奇怪，这就是它的全部意义所在。

更新2：这个想法是找到一个在以下情况下可能有用的正则表达式：

$> echo "1/2" | grep -P regex
1/2
$> echo "2/4" | grep -P regex

因此，正则表达式应该只是一个字符串，而不使用任何脚本和变量。只有正则表达式。

实际上，我已经知道一些匹配一元数字系统中编写的可约分数的正则表达式。

$> echo "11/1111" | grep -P '^1/1+$|(11+)+\1+/\1+$'
11/1111

所以问题是在正则表达式中从十进制转换为一元数字系统，但我不知道如何。

原文

How can I match irreducible fractions with regex?

For example, 23/25, 3/4, 5/2, 100/101, etc.

First of all, I have no idea about the gcd-algorithm realization in regex.

Update for all of you who's answering like "You are using the wrong tool":

Yeah, guys, I'm realizing what regex is normally used for. It's okay. But that this question is weird is kind of its whole point.

Updated 2: The idea is to find a regex that could be helpful in a situation like:

gt; echo "1/2" | grep -P regex
1/2
gt; echo "2/4" | grep -P regex

So, the regex should be only a string, without using any scripts and variables. Only regex.

Actually, I already know some regex which match reducible fractions written in the unary number system.

gt; echo "11/1111" | grep -P '^1/1+$|(11+)+\1+/\1+
So the thing is to convert from decimal to unary number system in regex, but I don't know how.

11/1111

So the thing is to convert from decimal to unary number system in regex, but I don't know how.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

债姬 2024-11-08 09:30:03

更新

由于海报要求一个与“36/270”等字符串匹配的正则表达式，但表示它的可读性并不重要，因此该正则表达式是：

my $reducible_rx = qr{^(\d+)/(\d+)$(?(?{(1x$1."/".1x$2)=~m{^(?|1+/(1)|(11+)\1*/\1+)$}})|^)};

但是，如果像我一样，您认为难以辨认的正则表达式是绝对不可接受的，您可以将其写得更清晰：

my $reducible_rx = qr{
  # first match a fraction:
    ^ ( \d+ ) / ( \d+ ) $
  # now for the hard part:
    (?(?{ ( 1 x $1 . "/" . 1 x $2 ) =~ m{
                ^
                (?|    1+      / (1)  # trivial case: GCD=1
                  |  (11+) \1* / \1+  # find the GCD
                )
                 $
            }x
        })
          # more portable version of (*PASS)
     | ^  # more portable version of (*FAIL)
     )
}x;

您可以通过将与一元版本匹配的版本与与十进制版本匹配的版本分开来提高可维护性，如下所示：

# this one assumes unary notation
my $unary_rx = qr{
    ^ 
    (?|   1+       / (1)
      | (11+)  \1* / \1+ 
    ) 
    $
}x;

# this one assumes decimal notation and converts internally
my $decimal_rx = qr{
  # first match a fraction:
    ^ ( \d+ ) / ( \d+ ) $ 
  # now for the hard part:
    (?(?{( 1 x $1 . "/" . 1 x $2 ) =~ $unary_rx})
          # more portable version of (*PASS)
     | ^  # more portable version of (*FAIL) 
     )
}x;

将其分成两个命名的正则表达式不是更容易吗？现在，这将使 $reducible_rx 与 $decimal_rx 相同，但一元版本是它自己的东西。我就是这样做的，但是原始海报想要一个正则表达式，因此您必须插入嵌套的正则表达式，正如我在上面首先介绍的那样。

无论哪种方式，您都可以使用以下方法插入下面的测试工具：

    if ($frac =~ $reducible_rx) {
        cmp_ok($frac, "ne", reduce($i, $j), "$i/$j is $test");
    } else {
        cmp_ok($frac, "eq", reduce($i, $j), "$i/$j is $test");
    }

您将看到它是一个通过所有测试的正确正则表达式，而且使用单个正则表达式来实现这一点，因此现在已经通过了原始问题的所有要求，我声明Qᴜᴏᴅ ᴇʀᴀᴛ ᴅᴇᴍᴏɴsᴛʀᴀɴᴅᴜᴍ：“退出，够了。”

UPDATE

Since the poster requested a single regex that matches against strings like "36/270", but says it doesn’t matter how legible it is, that regex is:

my $reducible_rx = qr{^(\d+)/(\d+)$(?(?{(1x$1."/".1x$2)=~m{^(?|1+/(1)|(11+)\1*/\1+)$}})|^)};

But, if like me, you believe that an illegible regex is absolutely unacceptable, you will write that more legibly as:

my $reducible_rx = qr{
  # first match a fraction:
    ^ ( \d+ ) / ( \d+ ) $
  # now for the hard part:
    (?(?{ ( 1 x $1 . "/" . 1 x $2 ) =~ m{
                ^
                (?|    1+      / (1)  # trivial case: GCD=1
                  |  (11+) \1* / \1+  # find the GCD
                )
                 $
            }x
        })
          # more portable version of (*PASS)
     | ^  # more portable version of (*FAIL)
     )
}x;

You can improve maintainability by splitting out the version that matches the unary version from the one that matches the decimal version like this:

# this one assumes unary notation
my $unary_rx = qr{
    ^ 
    (?|   1+       / (1)
      | (11+)  \1* / \1+ 
    ) 
    $
}x;

# this one assumes decimal notation and converts internally
my $decimal_rx = qr{
  # first match a fraction:
    ^ ( \d+ ) / ( \d+ ) $ 
  # now for the hard part:
    (?(?{( 1 x $1 . "/" . 1 x $2 ) =~ $unary_rx})
          # more portable version of (*PASS)
     | ^  # more portable version of (*FAIL) 
     )
}x;

Isn’t that much easier by separating it into two named regexes? That would now make $reducible_rx the same as $decimal_rx, but the unary version is its own thing. That’s how I would do it, but the original poster wanted a single regex, so you’d have to interpolate the nested one for that as I first present above.

Either way, you can plug into the test harness below using:

    if ($frac =~ $reducible_rx) {
        cmp_ok($frac, "ne", reduce($i, $j), "$i/$j is $test");
    } else {
        cmp_ok($frac, "eq", reduce($i, $j), "$i/$j is $test");
    }

And you will see that it is a correct regex that passes all tests, and does so moreover using a single regex, wherefore having now passed all requirements of the original question, I declare Qᴜᴏᴅ ᴇʀᴀᴛ ᴅᴇᴍᴏɴsᴛʀᴀɴᴅᴜᴍ: “Quit, enough done.” ????

And you’re welcome.

The answer is to match the regex ^(?|1+/(1)|(11+)\1*/\1+)$ against the fraction once it has been converted from decimal to unary notation, at which point the greatest common factor will be found in $1 on a match; otherwise they are coprimes. If you are using Perl 5.14 or better, you can even do this in one step:

use 5.014;
my $reg  = qr{^(?|1+/(1)|(11+)\1*/\1+)$};
my $frac = "36/270";  # for example
if ($frac =~ s/(\d+)/1 x $1/reg =~ /$reg/) { 
    say "$frac can be reduced by ", length $1;
} else {
    say "$frac is irreducible";
}

Which will correctly report that:

36/270 can be reduced by 18

(And of course, reducing by 1 means there is no longer a denominator.)

If you wanted to have a bit of punning fun with your readers, you could even do it this way:

use 5.014;
my $regex = qr{^(?|1+/(1)|(11+)\1*/\1+)$};
my $frac  = "36/270";  # for example
if ($frac =~ s/(\d+)/"1 x $1"/regex =~ /$regex/) {
    say "$frac can be reduced by ", length $1;
} else {
    say "$frac is irreducible";
}

Here is the code that demonstrates how to do this. Furthermore, it constructs a test suite that tests its algorithm using all (positive) numerators and denominators up to its argument, or 30 by default. To run it under a test harness, put it in a file named coprimes and do this:

$ perl -MTest::Harness -e 'runtests("coprimes")'
coprimes .. ok       
All tests successful.
Files=1, Tests=900,  1 wallclock secs ( 0.13 usr  0.02 sys +  0.33 cusr  0.02 csys =  0.50 CPU)
Result: PASS

Here is an example of its output when run without the test harness:

$ perl coprimes 10
1..100
ok 1 - 1/1 is 1
ok 2 - 1/2 is 1/2
ok 3 - 1/3 is 1/3
ok 4 - 1/4 is 1/4
ok 5 - 1/5 is 1/5
ok 6 - 1/6 is 1/6
ok 7 - 1/7 is 1/7
ok 8 - 1/8 is 1/8
ok 9 - 1/9 is 1/9
ok 10 - 1/10 is 1/10
ok 11 - 2/1 is 2
ok 12 - 2/2 is 1
ok 13 - 2/3 is 2/3
ok 14 - 2/4 is 1/2
ok 15 - 2/5 is 2/5
ok 16 - 2/6 is 1/3
ok 17 - 2/7 is 2/7
ok 18 - 2/8 is 1/4
ok 19 - 2/9 is 2/9
ok 20 - 2/10 is 1/5
ok 21 - 3/1 is 3
ok 22 - 3/2 is 3/2
ok 23 - 3/3 is 1
ok 24 - 3/4 is 3/4
ok 25 - 3/5 is 3/5
ok 26 - 3/6 is 1/2
ok 27 - 3/7 is 3/7
ok 28 - 3/8 is 3/8
ok 29 - 3/9 is 1/3
ok 30 - 3/10 is 3/10
ok 31 - 4/1 is 4
ok 32 - 4/2 is 2
ok 33 - 4/3 is 4/3
ok 34 - 4/4 is 1
ok 35 - 4/5 is 4/5
ok 36 - 4/6 is 2/3
ok 37 - 4/7 is 4/7
ok 38 - 4/8 is 1/2
ok 39 - 4/9 is 4/9
ok 40 - 4/10 is 2/5
ok 41 - 5/1 is 5
ok 42 - 5/2 is 5/2
ok 43 - 5/3 is 5/3
ok 44 - 5/4 is 5/4
ok 45 - 5/5 is 1
ok 46 - 5/6 is 5/6
ok 47 - 5/7 is 5/7
ok 48 - 5/8 is 5/8
ok 49 - 5/9 is 5/9
ok 50 - 5/10 is 1/2
ok 51 - 6/1 is 6
ok 52 - 6/2 is 3
ok 53 - 6/3 is 2
ok 54 - 6/4 is 3/2
ok 55 - 6/5 is 6/5
ok 56 - 6/6 is 1
ok 57 - 6/7 is 6/7
ok 58 - 6/8 is 3/4
ok 59 - 6/9 is 2/3
ok 60 - 6/10 is 3/5
ok 61 - 7/1 is 7
ok 62 - 7/2 is 7/2
ok 63 - 7/3 is 7/3
ok 64 - 7/4 is 7/4
ok 65 - 7/5 is 7/5
ok 66 - 7/6 is 7/6
ok 67 - 7/7 is 1
ok 68 - 7/8 is 7/8
ok 69 - 7/9 is 7/9
ok 70 - 7/10 is 7/10
ok 71 - 8/1 is 8
ok 72 - 8/2 is 4
ok 73 - 8/3 is 8/3
ok 74 - 8/4 is 2
ok 75 - 8/5 is 8/5
ok 76 - 8/6 is 4/3
ok 77 - 8/7 is 8/7
ok 78 - 8/8 is 1
ok 79 - 8/9 is 8/9
ok 80 - 8/10 is 4/5
ok 81 - 9/1 is 9
ok 82 - 9/2 is 9/2
ok 83 - 9/3 is 3
ok 84 - 9/4 is 9/4
ok 85 - 9/5 is 9/5
ok 86 - 9/6 is 3/2
ok 87 - 9/7 is 9/7
ok 88 - 9/8 is 9/8
ok 89 - 9/9 is 1
ok 90 - 9/10 is 9/10
ok 91 - 10/1 is 10
ok 92 - 10/2 is 5
ok 93 - 10/3 is 10/3
ok 94 - 10/4 is 5/2
ok 95 - 10/5 is 2
ok 96 - 10/6 is 5/3
ok 97 - 10/7 is 10/7
ok 98 - 10/8 is 5/4
ok 99 - 10/9 is 10/9
ok 100 - 10/10 is 1

And here is the program:

#!/usr/bin/env perl
#
# coprimes - test suite to use unary coprimality algorithm
# 
# Tom Christiansen <[email protected]>
# Sun Apr 17 12:18:19 MDT 2011

use strict;
use warnings;

my $DEFAULT = 2*3*5;
my $max = @ARGV ? shift : $DEFAULT;

use Test::More;
plan tests => $max ** 2;

my $rx = qr{
    ^
    (?|   1+       / (1)
      | (11+)  \1* / \1+
    )
    $
}x;

for my $i ( 1 .. $max ) {
    for my $j ( 1 .. $max ) {
        my $test;
        if (((1 x $i) . "/" . (1 x $j)) =~ /$rx/) {
            my $cf = length($1);
            $test = $i / $cf;
            $test .= "/" . $j/$cf unless $j/$cf == 1;
        } else {
            $test = "$i/$j";
        }
        cmp_ok($test, "eq", reduce($i, $j), "$i/$j is $test");
    }
}

sub reduce {
    my ($a, $b) = @_;
    use Math::BigRat;
    my $f = new Math::BigRat "$a/$b";
    return "$f";
}

回复收藏 0 原文

恍梦境° 2024-11-08 09:30:03

不，这是不可能的。就像一个优秀的计算机科学家一样，我会忽略工具正则表达式的细节，并假设您在询问是否存在正则表达式。我对正则表达式的功能没有足够的了解，无法确保它仅限于正则表达式。抛开这个警告不谈，继续表演吧。

改写一下我们得到：

设 L 为语言{"a/b"|其中 a 和 b 是以基数 r 和 a 和 b 编码的自然数> 互质}。 L 是正则吗？

假设这样的语言是正规的。那么存在一个可以决定L中的成员资格的DFA。令 N 为此类 DFA 的状态数。素数有无数个。由于素数的数量是无限的，因此存在任意多个素数大于基数r中N位可编码的最大数。（注意：最大的数字显然是 r 的 N 次方。我使用这种奇怪的措辞来展示如何容纳一元。）选择 N+ 1 大于该数字的素数。所有这些数字都使用至少 N+1 位数字（以基数 r 表示）进行编码。枚举这些素数 p₀ 到 pₙ。令 sᵢ 为读取 / 后 pᵢ 的状态。根据鸽巢原理，有 N 个状态和 N+1 sᵢ 状态，因此至少存在一对索引 ( j,k) 使得 sⱼ = sₖ。因此，从 DFA 的初始状态开始，输入 pₖ/ 和 pⱼ/ 会导致相同的状态 sⱼ （或 sₖ< /code>) 和 pⱼ 和 pₖ 是不同的素数。

L 必须接受所有不同素数对 p/q，因为它们是互质的，并拒绝所有被自身除 p/p 的素数作为 p 与 p 不互质。现在该语言接受 pⱼ = pₖ，因此存在从使用字符串 pₖ 的 sⱼ 到接受状态的状态序列，将此序列称为 <代码>β。令 α 为从初始状态开始读取 pₖ 的状态序列。从字符串 pₖ/pₖ 的初始状态开始的 DFA 状态序列必须与 α 后跟 相同β。该序列从初始状态开始，进入 sₖ（通过读取输入 pₖ），并通过读取 pₖ 达到接受状态。 DFA 接受 pₖ/pₖ，并且 pₖ/pₖ 位于 L 中。 pₖ 与 pₖ 不互质，因此 pₖ/pₖ 不在 L 中。矛盾。因此语言L是不规则的，或者不存在正则表达式。

Nope it cannot be done. Like a good computer scientist I will ignore the specifics of the tool regex and assume you are asking if there is a regular expression. I do not have enough knowledge about regex's features to ensure it is restricted to regular expressions. That caveat aside, on with the show.

Rewording this we get:

Let L be the language {"a/b"| where a and b are natural numbers encoded in a radix r and a and b are coprime}. Is L regular?

Assume such a language is regular. Then there exists a DFA that can decide membership in L. Let N be the number of states of such a DFA. There are an infinite number of primes. As the number of primes is infinite, there are arbitrarily many primes greater than the largest number encodable in N digits in the radix r. (Note: The largest number is clearly r raised to the power of N. I am using this weird wording to show how to accommodate unary.) Select N+1 primes that are greater than this number. All of these numbers are encoded using at least N+1 digits (in the radix r). Enumerate these primes p₀ to pₙ. Let sᵢ be the state of the pᵢ is in immediately after reading the /. By the pigeon hole principle, there are N states and N+1 sᵢ states so there exists at least one pair of indexes (j,k) such that sⱼ = sₖ. So starting from the initial state of the DFA, inputs pₖ/ and pⱼ/ lead to the same state sⱼ (or sₖ) and pⱼ and pₖ are distinct primes.

L must accept all pairs of distinct primes p/q as they are coprime and reject all primes divided by themselves p/p as p is not coprime to p. Now the language accepts pⱼ = pₖ so there is a sequence of states from sⱼ using the string pₖ to an accepting state, call this sequence β. Let α be the sequence of states reading pₖ starting from the initial state. The sequence of states for the DFA starting at the initial state for the string pₖ/pₖ must be the same as α followed by β. This sequence starts in an initial state, goes to sₖ (by reading the input pₖ), and reaches an accepting state by reading pₖ. The DFA accepts pₖ/pₖ and pₖ/pₖ is in L. pₖ is not coprime to pₖ, and therefore pₖ/pₖ is not in L. Contradiction. Therefore the language L is irregular, or no regular expression exists.

回复收藏 0 原文

掩饰不了的爱 2024-11-08 09:30:03

如果您以一元形式编写数字，并使用“：”作为除号，我认为这与可约分数匹配：

/^1+:1$|^(11+):\1$|^(11+?)\2+:\2\2+$/

然后您可以使用 !~ 查找不匹配的字符串。

基于此：http://montreal.pm.org/tech/neil_kandalgaonkar.shtml

If you write the numbers in unary, and use ":" as the division sign, I think this matches reducible fractions:

/^1+:1$|^(11+):\1$|^(11+?)\2+:\2\2+$/

You can then use !~ to find strings that don't match.

Based on this: http://montreal.pm.org/tech/neil_kandalgaonkar.shtml

回复收藏 0 原文

溺孤伤于心 2024-11-08 09:30:03

你可以知道，以(0,5)结尾的数字可以被(5)整除，或者以(2,4,6,8,0)结尾的数字可以被2整除。

对于3,4,6,7， 8,9 作为除数，我不期望有这种可能性，对于任意除数也不是这样。

我猜你知道决定被 3 整除的方法 - 构建递归交叉和，它必须能被 3 整除，才能使数字可整除。因此，您可以从数字中消除所有 3、6 和 9，以及 0。对于任意数字，您可以按以下方式进行：

删除每个 0369
，将 47 更改为 1，（因为 4%3 和 7%3 = 1）
将58更改为2，原因见上文，
将每2更改为11
，将每组111更改为空。

如果结果为空，则该数字可以被 3 整除：

echo ${RANDOM}${RANDOM}${RANDOM} | sed 's/[0369]//g;s/[47]/1/g;s/[58]/2/g;s/2/11/g;s/1\{3\}//g'

类似的方法也适用于 9，其中您有类似的规则。但是任意除数的通用方法是什么？

You can know, that a number, ending in (0,5) is divisible by (5), or ending in (2,4,6,8,0) is divisible by 2.

For 3,4,6,7,8,9 as divisors, I wouldn't expect a possibility, and not for arbitrary divisors too.

I guess you know the method, to decide divisibility by 3 - to build the rekursive crosssum, which has to be divisible by 3, to make the number divisible. So there you could eliminate all 3s, 6s and 9s from the number, as well as the 0. For an arbitrary number, you would proceed this way:

delete every 0369
change 47 to 1, (because 4%3 and 7%3 = 1)
change 58 to 2, reason see above
change every 2 to 11
change every group of 111 to nothing.

If the result is empty, the number was divisible by 3:

echo ${RANDOM}${RANDOM}${RANDOM} | sed 's/[0369]//g;s/[47]/1/g;s/[58]/2/g;s/2/11/g;s/1\{3\}//g'

A similar approach could work for 9, where you have a similar rule. But a general approach for arbitrary divisors?

回复收藏 0 原文

~没有更多了~

关于作者

爱殇璃

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

正则表达式匹配不可约分数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

更新

UPDATE

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

正则表达式匹配不可约分数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

更新

UPDATE

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。