当前位置：文江博客话题详情

正则表达式谜题

发布于 2024-09-18 17:30:14 字数 252 浏览 4 评论 0原文

这不是作业，而是一道老考试题。我很好奇看到答案。

给定一个字母表 S={0,1,2,3,4,5,6,7,8,9,+}。将语言 L 定义为该字母表中的字符串 w 的集合，使得 w 在 L 中，如果：

a) w 是一个数字，例如 42 或 w 是数字的（有限）和，例如 34 + 16 或 34 + 2 + 10

且

b) w 表示的数字可被 3 整除。

为 L 编写正则表达式（和 DFA）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

唯憾梦倾城 2024-09-25 17:30:14

这应该可行：

^(?:0|(?:(?:[369]|[147](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\
+)*[369]0*)*\+?(?:0\+)*[258])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]|0*(?:
\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])|[
258](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0
\+)*[147])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]|0*(?:\+?(?:0\+)*[369]0*)
*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]))0*)+)(?:\+(?:0|(?:(?
:[369]|[147](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)
*\+?(?:0\+)*[258])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]|0*(?:\+?(?:0\+)*
[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])|[258](?:0*(?
:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])*
(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]|0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)
*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]))0*)+))*$

它的工作原理是通过三个状态来表示迄今为止模 3 的数字之和。它不允许数字上有前导零，不允许在字符串的开头和结尾处使用加号，以及两个连续的加号。

正则表达式的生成和测试床：

a = r'0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*'
b = r'a[147]'
c = r'a[258]'

r1 = '[369]|[147](?:bc)*(?:c|bb)|[258](?:cb)*(?:b|cc)'
r2 = '(?:0|(?:(?:' + r1 + ')0*)+)'
r3 = '^' + r2 + r'(?:\+' + r2 + ')*

r = r3.replace('b', b).replace('c', c).replace('a', a)

print r

# Test on 10000 examples.

import random, re
random.seed(1)
r = re.compile(r)
for _ in range(10000):
    x = ''.join(random.choice('0123456789+') for j in range(random.randint(1,50)))
    if re.search(r'(?:\+|^)(?:\+|0[0-9])|\+
, x):
        valid = False
    else:
        valid = eval(x) % 3 == 0
    result = re.match(r, x) is not None
    if result != valid:
        print 'Failed for ' + x

This should work:

^(?:0|(?:(?:[369]|[147](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\
+)*[369]0*)*\+?(?:0\+)*[258])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]|0*(?:
\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])|[
258](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0
\+)*[147])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]|0*(?:\+?(?:0\+)*[369]0*)
*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]))0*)+)(?:\+(?:0|(?:(?
:[369]|[147](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)
*\+?(?:0\+)*[258])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]|0*(?:\+?(?:0\+)*
[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])|[258](?:0*(?
:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])*
(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]|0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)
*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]))0*)+))*$

It works by having three states representing the sum of the digits so far modulo 3. It disallows leading zeros on numbers, and plus signs at the start and end of the string, as well as two consecutive plus signs.

Generation of regular expression and test bed:

a = r'0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*'
b = r'a[147]'
c = r'a[258]'

r1 = '[369]|[147](?:bc)*(?:c|bb)|[258](?:cb)*(?:b|cc)'
r2 = '(?:0|(?:(?:' + r1 + ')0*)+)'
r3 = '^' + r2 + r'(?:\+' + r2 + ')*

r = r3.replace('b', b).replace('c', c).replace('a', a)

print r

# Test on 10000 examples.

import random, re
random.seed(1)
r = re.compile(r)
for _ in range(10000):
    x = ''.join(random.choice('0123456789+') for j in range(random.randint(1,50)))
    if re.search(r'(?:\+|^)(?:\+|0[0-9])|\+
, x):
        valid = False
    else:
        valid = eval(x) % 3 == 0
    result = re.match(r, x) is not None
    if result != valid:
        print 'Failed for ' + x

回复收藏 0 原文

无言温柔 2024-09-25 17:30:14

请注意，我对 DFA 语法的记忆已经过时了，所以我的答案无疑有点不完整。希望这能给您一个总体概念。我选择完全忽略 + 。正如 AmirW 所说，abc+def 和 abcdef 在整除性方面是相同的。

接受状态是 C。

A=1,4,7,BB,AC,CA  
B=2,5,8,AA,BC,CB  
C=0,3,6,9,AB,BA,CC

请注意，上述语言使用了所有 9 种可能的 ABC 配对。它总是以 A、B 或 C 结尾，并且每个变量使用都是成对的这一事实意味着每次处理迭代都会缩短变量字符串。

例子：

1490 = AACC = BCC = BC = B (Fail)  
1491 = AACA = BCA = BA = C  (Success)

Note that my memory of DFA syntax is woefully out of date, so my answer is undoubtedly a little broken. Hopefully this gives you a general idea. I've chosen to ignore + completely. As AmirW states, abc+def and abcdef are the same for divisibility purposes.

Accept state is C.

A=1,4,7,BB,AC,CA  
B=2,5,8,AA,BC,CB  
C=0,3,6,9,AB,BA,CC

Notice that the above language uses all 9 possible ABC pairings. It will always end at either A,B,or C, and the fact that every variable use is paired means that each iteration of processing will shorten the string of variables.

Example:

1490 = AACC = BCC = BC = B (Fail)  
1491 = AACA = BCA = BA = C  (Success)

回复收藏 0 原文

如歌彻婉言 2024-09-25 17:30:14

不是一个完整的解决方案，只是一个想法：

（B）单独：“加号”在这里并不重要。 abc + def 与 abcdef 相同，都是为了能被 3 整除。对于后一种情况，这里有一个正则表达式：http ://blog.vkistudios.com/index.cfm/2008/12/30/Regular-Expression-to-define-if-a-base-10-number-is-divisible-by-3

进行组合有了要求（A），我们可以采用（B）的解决方案并修改它：

第一个读取的字符必须在0..9（不是加号）
输入不能以加号结尾，因此：复制每个状态（将使用 S 作为原始状态，使用 S' 作为后续状态重复以区分它们）。如果我们处于状态 S 并且读取了一个加号，我们将移至 S'。
当读取数字时，我们将进入新状态，就像处于 S 中一样。 S' 状态不能接受（另一个）加号。
此外，即使 S 是“接受状态”，S' 也不是“接受状态”。（因为输入不能以加号结尾）。