正则表达式谜题

发布于 2024-09-18 17:30:14 字数 252 浏览 4 评论 0原文

这不是作业,而是一道老考试题。我很好奇看到答案。

给定一个字母表 S={0,1,2,3,4,5,6,7,8,9,+}。将语言 L 定义为该字母表中的字符串 w 的集合,使得 w 在 L 中,如果:

a) w 是一个数字,例如 42 w 是数字的(有限)和,例如 34 + 16 或 34 + 2 + 10

b) w 表示的数字可被 3 整除。

为 L 编写正则表达式(和 DFA)。

This is not homework, but an old exam question. I am curious to see the answer.

We are given an alphabet S={0,1,2,3,4,5,6,7,8,9,+}. Define the language L as the set of strings w from this alphabet such that w is in L if:

a) w is a number such as 42 or w is the (finite) sum of numbers such as 34 + 16 or 34 + 2 + 10

and

b) The number represented by w is divisible by 3.

Write a regular expression (and a DFA) for L.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

唯憾梦倾城 2024-09-25 17:30:14

这应该可行:

^(?:0|(?:(?:[369]|[147](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\
+)*[369]0*)*\+?(?:0\+)*[258])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]|0*(?:
\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])|[
258](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0
\+)*[147])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]|0*(?:\+?(?:0\+)*[369]0*)
*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]))0*)+)(?:\+(?:0|(?:(?
:[369]|[147](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)
*\+?(?:0\+)*[258])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]|0*(?:\+?(?:0\+)*
[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])|[258](?:0*(?
:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])*
(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]|0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)
*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]))0*)+))*$

它的工作原理是通过三个状态来表示迄今为止模 3 的数字之和。它不允许数字上有前导零,不允许在字符串的开头和结尾处使用加号,以及两个连续的加号。

正则表达式的生成和测试床:

a = r'0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*'
b = r'a[147]'
c = r'a[258]'

r1 = '[369]|[147](?:bc)*(?:c|bb)|[258](?:cb)*(?:b|cc)'
r2 = '(?:0|(?:(?:' + r1 + ')0*)+)'
r3 = '^' + r2 + r'(?:\+' + r2 + ')*

r = r3.replace('b', b).replace('c', c).replace('a', a)

print r

# Test on 10000 examples.

import random, re
random.seed(1)
r = re.compile(r)
for _ in range(10000):
    x = ''.join(random.choice('0123456789+') for j in range(random.randint(1,50)))
    if re.search(r'(?:\+|^)(?:\+|0[0-9])|\+
, x):
        valid = False
    else:
        valid = eval(x) % 3 == 0
    result = re.match(r, x) is not None
    if result != valid:
        print 'Failed for ' + x

This should work:

^(?:0|(?:(?:[369]|[147](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\
+)*[369]0*)*\+?(?:0\+)*[258])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]|0*(?:
\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])|[
258](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0
\+)*[147])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]|0*(?:\+?(?:0\+)*[369]0*)
*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]))0*)+)(?:\+(?:0|(?:(?
:[369]|[147](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)
*\+?(?:0\+)*[258])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]|0*(?:\+?(?:0\+)*
[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])|[258](?:0*(?
:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])*
(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]|0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)
*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]))0*)+))*$

It works by having three states representing the sum of the digits so far modulo 3. It disallows leading zeros on numbers, and plus signs at the start and end of the string, as well as two consecutive plus signs.

Generation of regular expression and test bed:

a = r'0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*'
b = r'a[147]'
c = r'a[258]'

r1 = '[369]|[147](?:bc)*(?:c|bb)|[258](?:cb)*(?:b|cc)'
r2 = '(?:0|(?:(?:' + r1 + ')0*)+)'
r3 = '^' + r2 + r'(?:\+' + r2 + ')*

r = r3.replace('b', b).replace('c', c).replace('a', a)

print r

# Test on 10000 examples.

import random, re
random.seed(1)
r = re.compile(r)
for _ in range(10000):
    x = ''.join(random.choice('0123456789+') for j in range(random.randint(1,50)))
    if re.search(r'(?:\+|^)(?:\+|0[0-9])|\+
, x):
        valid = False
    else:
        valid = eval(x) % 3 == 0
    result = re.match(r, x) is not None
    if result != valid:
        print 'Failed for ' + x
无言温柔 2024-09-25 17:30:14

请注意,我对 DFA 语法的记忆已经过时了,所以我的答案无疑有点不完整。希望这能给您一个总体概念。我选择完全忽略 + 。正如 AmirW 所说,abc+def 和 abcdef 在整除性方面是相同的。

接受状态是 C。

A=1,4,7,BB,AC,CA  
B=2,5,8,AA,BC,CB  
C=0,3,6,9,AB,BA,CC

请注意,上述语言使用了所有 9 种可能的 ABC 配对。它总是以 A、B 或 C 结尾,并且每个变量使用都是成对的这一事实意味着每次处理迭代都会缩短变量字符串。

例子:

1490 = AACC = BCC = BC = B (Fail)  
1491 = AACA = BCA = BA = C  (Success)

Note that my memory of DFA syntax is woefully out of date, so my answer is undoubtedly a little broken. Hopefully this gives you a general idea. I've chosen to ignore + completely. As AmirW states, abc+def and abcdef are the same for divisibility purposes.

Accept state is C.

A=1,4,7,BB,AC,CA  
B=2,5,8,AA,BC,CB  
C=0,3,6,9,AB,BA,CC

Notice that the above language uses all 9 possible ABC pairings. It will always end at either A,B,or C, and the fact that every variable use is paired means that each iteration of processing will shorten the string of variables.

Example:

1490 = AACC = BCC = BC = B (Fail)  
1491 = AACA = BCA = BA = C  (Success)
如歌彻婉言 2024-09-25 17:30:14

不是一个完整的解决方案,只是一个想法:

(B)单独:“加号”在这里并不重要。 abc + defabcdef 相同,都是为了能被 3 整除。对于后一种情况,这里有一个正则表达式:http ://blog.vkistudios.com/index.cfm/2008/12/30/Regular-Expression-to-define-if-a-base-10-number-is-divisible-by-3

进行组合有了要求(A),我们可以采用(B)的解决方案并修改它:

  • 第一个读取的字符必须在0..9(不是加号)

  • 输入不能以加号结尾,因此:复制每个状态(将使用 S 作为原始状态,使用 S' 作为后续状态重复以区分它们)。如果我们处于状态 S 并且读取了一个加号,我们将移至 S'

  • 当读取数字时,我们将进入新状态,就像处于 S 中一样。 S' 状态不能接受(另一个)加号。

  • 此外,即使 S 是“接受状态”,S' 也不是“接受状态”。 (因为输入不能以加号结尾)。

Not a full solution, just an idea:

(B) alone: The "plus" signs don't matter here. abc + def is the same as abcdef for the sake of divisibility by 3. For the latter case, there is a regexp here: http://blog.vkistudios.com/index.cfm/2008/12/30/Regular-Expression-to-determine-if-a-base-10-number-is-divisible-by-3

to combine this with requirement (A), we can take the solution of (B) and modify it:

  • First read character must be in 0..9 (not a plus)

  • Input must not end with a plus, so: Duplicate each state (will use S for the original state and S' for the duplicate to distinguish between them). If we're in state S and we read a plus we'll move to S'.

  • When reading a number we'll go to the new state as if we were in S. S' states cannot accept (another) plus.

  • Also, S' is not "accept state" even if S is. (because input must not end with a plus).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文