代码高尔夫:有限状态机!
有限状态机
确定性有限状态机是一种简单的计算模型,在计算机科学基础课程中广泛用作自动机理论的介绍。它是一个简单的模型,相当于正则表达式,它确定某个输入字符串是接受还是拒绝。 抛开一些形式,有限状态机的运行由以下部分组成:
- 字母表,一组字符。
- 状态,通常显示为圆圈。其中一个状态必须是开始状态。有些状态可能接受,通常显示为双圆圈。
- 转换通常被可视化为状态之间的有向拱门,是与字母相关的状态之间的有向链接。
- 输入字符串,字母字符列表。
机器上的运行从启动状态开始。读取输入字符串的每个字母;如果当前状态和与该字母对应的另一个状态之间存在转换,则当前状态更改为新状态。读完最后一个字母后,如果当前状态为接受状态,则接受输入字符串。如果最后一个状态不是接受状态,或者字母在运行期间没有对应的状态,则输入字符串将被拒绝。
注意:这个简短的解释远不是 FSM 的完整、正式的定义; 维基百科的优秀文章对该主题进行了精彩的介绍。
示例
例如,以下机器判断从左到右读取的二进制数是否有偶数个 0
:
- 字母表是集合
{0,1}
。 - 状态是 S1 和 S2。
- 转换为
(S1, 0) -> S2,
(S1,1)-> S1,
(S2,0)-> S1
和(S2, 1) -> S2。
- 输入字符串是任何二进制数,包括空字符串。
规则:
用您选择的语言实现 FSM。
输入
FSM 应接受以下输入:
<States> List of state, separated by space mark.
The first state in the list is the start state.
Accepting states begin with a capital letter.
<transitions> One or more lines.
Each line is a three-tuple:
origin state, letter, destination state)
<input word> Zero or more characters, followed by a newline.
例如,前面提到的以 1001010
作为输入字符串的机器将写为:
S1 s2
S1 0 s2
S1 1 S1
s2 0 S1
s2 1 s2
1001010
输出
FSM 的运行,写为
,后跟最终状态。示例输入的输出为:
S1 1 -> S1
S1 0 -> s2
s2 0 -> S1
S1 1 -> S1
S1 0 -> s2
s2 1 -> s2
s2 0 -> S1
ACCEPT
对于空输入 ''
:
S1
ACCEPT
注意: 在您的注释之后,S1
行(显示第一个state) 可能会被省略,并且以下输出也是可以接受的:
ACCEPT
For 101
:
S1 1 -> S1
S1 0 -> s2
s2 1 -> s2
REJECT
For '10X'
:
S1 1 -> S1
S1 0 -> s2
s2 X
REJECT
奖品
最短的解决方案将获得 250 代表奖金。
参考实现
参考 Python 实现可在此处获取。请注意,对于空字符串输入,输出要求已放宽。
附录
输出格式
根据流行的需求,对于空输入字符串,以下输出也是可以接受的:
ACCEPT
或
REJECT
没有在前一行中写入第一个状态。
状态名称
有效的状态名称是一个英文字母,后跟任意数量的字母、_
和数字,非常类似于变量名称,例如 State1
、state0
,STATE_0
。
输入格式
与大多数代码高尔夫一样,您可以假设您的输入是正确的。
答案摘要:
- Cobol - 4078 个字符
- Python - 171 个字符,568 个字符,203 个字符,218 个字符, 269 个字符
- sed - 137 个字符
- ruby - 145 个字符,183 个字符
- Haskell - 192 个字符, 189 个字符
- LISP - 725 个字符
- Perl - 184 个字符
- Bash - 184 个字符
- Rexx - 205 个字符
- Lua - 356 个字符
- F# - 420 个字符
- C# - 356 个字符
- Mixal - 898 个字符
sed
137 解决方案是最短的,ruby 145 是#2。目前,我无法让 sed 解决方案工作:
cat test.fsm | sed -r solution.sed
sed -r solution.sed test.fsm
两者都给了我:
sed: -e expression #1, char 12: unterminated `s' command
所以除非有澄清,否则赏金将归于 ruby 解决方案。
Finite state machine
A deterministic finite state machine is a simple computation model, widely used as an introduction to automata theory in basic CS courses. It is a simple model, equivalent to regular expression, which determines of a certain input string is Accepted or Rejected. Leaving some formalities aside, A run of a finite state machine is composed of:
- alphabet, a set of characters.
- states, usually visualized as circles. One of the states must be the start state. Some of the states might be accepting, usually visualized as double circles.
- transitions, usually visualized as directed arches between states, are directed links between states associated with an alphabet letter.
- input string, a list of alphabet characters.
A run on the machine begins at the starting state. Each letter of the input string is read; If there is a transition between the current state and another state which corresponds to the letter, the current state is changed to the new state. After the last letter was read, if the current state is an accepting state, the input string is accepted. If the last state was not an accepting state, or a letter had no corresponding arch from a state during the run, the input string is rejected.
Note: This short descruption is far from being a full, formal definition of a FSM; Wikipedia's fine article is a great introduction to the subject.
Example
For example, the following machine tells if a binary number, read from left to right, has an even number of 0
s:
- The alphabet is the set
{0,1}
. - The states are S1 and S2.
- The transitions are
(S1, 0) -> S2
,(S1, 1) -> S1
,(S2, 0) -> S1
and(S2, 1) -> S2
. - The input string is any binary number, including an empty string.
The rules:
Implement a FSM in a language of your choice.
Input
The FSM should accept the following input:
<States> List of state, separated by space mark.
The first state in the list is the start state.
Accepting states begin with a capital letter.
<transitions> One or more lines.
Each line is a three-tuple:
origin state, letter, destination state)
<input word> Zero or more characters, followed by a newline.
For example, the aforementioned machine with 1001010
as an input string, would be written as:
S1 s2
S1 0 s2
S1 1 S1
s2 0 S1
s2 1 s2
1001010
Output
The FSM's run, written as <State> <letter> -> <state>
, followed by the final state. The output for the example input would be:
S1 1 -> S1
S1 0 -> s2
s2 0 -> S1
S1 1 -> S1
S1 0 -> s2
s2 1 -> s2
s2 0 -> S1
ACCEPT
For the empty input ''
:
S1
ACCEPT
Note: Following your comments, the S1
line (showing the first state) might be omitted, and the following output is also acceptable:
ACCEPT
For 101
:
S1 1 -> S1
S1 0 -> s2
s2 1 -> s2
REJECT
For '10X'
:
S1 1 -> S1
S1 0 -> s2
s2 X
REJECT
Prize
A 250 rep bounty will be given to the shortest solution.
Reference implementation
A reference Python implementation is available here. Note that output requirements have been relaxed for empty-string input.
Addendum
Output format
Following popular demand, the following output is also acceptable for empty input string:
ACCEPT
or
REJECT
Without the first state written in the previous line.
State names
Valid state names are an English letter followed by any number of letters, _
and digits, much like variable names, e.g. State1
, state0
, STATE_0
.
Input format
Like most code golfs, you can assume your input is correct.
Summary of answers:
- Cobol - 4078 characters
- Python - 171 characters, 568 characters, 203 characters, 218 characters, 269 characters
- sed - 137 characters
- ruby - 145 characters, 183 characters
- Haskell - 192 characters, 189 characters
- LISP - 725 characters
- Perl - 184 characters
- Bash - 184 characters
- Rexx - 205 characters
- Lua - 356 characters
- F# - 420 characters
- C# - 356 characters
- Mixal - 898 characters
The sed
137 solution is the shortest, ruby 145 is #2. Currently, I can't get the sed solution to work:
cat test.fsm | sed -r solution.sed
sed -r solution.sed test.fsm
both gave me:
sed: -e expression #1, char 12: unterminated `s' command
so unless It there are clarifications the bounty goes to the ruby solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(20)
Python 2.7+,
201 192 187 181 179 175171 个字符PS。问题得到缓解后(不需要在空输入上输出状态行),这里是明显更短的新代码。如果您使用的是 <2.7 版本,则没有字典理解,因此应使用
{c+o:s for o,c,s in i[1:-1]}< /code> 尝试
dict((c+o,s)for o,c,s in i[1:-1])
以获得 +5 的价格。及其测试输出:
上一个条目(len 201):
在有人因此打我之前,我想道歉:代码行为与原始规范略有不同 - 每个问题评论讨论。这是我为讨论而做的例证。
附言。虽然我喜欢决议接受/拒绝与最终状态在同一行,但它可以通过添加
让我转移到孤独(例如想象结果将由只关心最后一行被接受或拒绝的愚蠢脚本解析) '\n'+
(5 个字符)到最后一个打印
,价格为 +5 个字符。输出示例:
Python 2.7+,
201 192 187 181 179 175171 charsPS. After the problem was relaxed (no need to output state line on empty input), here is new code that's notably shorter. If you are on version <2.7, there is no dict comprehension, so instead of
{c+o:s for o,c,s in i[1:-1]}
trydict((c+o,s)for o,c,s in i[1:-1])
for the price of +5.And its test output:
Previous entry (len 201):
I want to apologize before someone slaps me for it: the code behavior is slightly different from the original spec - per question-comments discussion. This is my illustration for the discussion.
PS. while i like the resolution ACCEPT/REJECT on the same line with the final state, it can me moved to solitude (e.g. imagine results are to be parsed by stupid script that only cares about last line being accept or reject) by adding
'\n'+
(5 chars) to the lastprint
for the price of +5 chars.Example output:
今天我感觉很复古,我为这项任务选择的语言是 IBM Enterprise Cobol - 字符计数
24624078 (抱歉,从面向屏幕的设备粘贴,尾随空格是一个悲惨的副作用):I'm feeling retro today, my language of choice for this task is IBM Enterprise Cobol - char count
24624078 (Sorry, pasted from a screen oriented device, trailing spaces are a tragic side effect):sed --
118137 个字符这是使用 -r 标志 (+3),总共 134+3=137 个字符。
这应该可以正确处理没有转换的输入...希望它现在完全符合规范...
sed --
118137 charactersThis is using the -r flag (+3), for a total of 134+3=137 characters.
This should handle inputs without transitions correctly... hopefully it fully complies with the spec now...
Ruby 1.9.2 -
178 190 182 177 153 161 158 154145 个字符测试脚本
输出
所有输入均以以下开头:
Ruby 1.9.2 -
178 190 182 177 153 161 158 154145 charactersTesting Script
Outputs
All input starts with:
Adam 提供了参考实现。我在制作之前没有看到它,但逻辑是相似的:
编辑:这是Python 2.6代码。我并没有试图减少长度;我只是试图让它在概念上变得简单。
Adam provided a reference implementation. I didn't see it before I made mine, but the logic is similar:
Edit: This is Python 2.6 code. I did not try to minimize length; I just tried to make it conceptually simple.
Python,218 个字符
Python, 218 characters
Haskell -
252216204197192 个字符符合输出规范。
Ungolf'd 版本:
一个很好的谜题是为什么在 Golf'd 版本中不需要
init
!除此之外,休息都是标准的哈斯克尔高尔夫技巧。Haskell -
252216204197192 charactersConforms to output specification.
Ungolf'd version:
A good puzzle is why
init
isn't needed in the golf'd version! Other than that, rest are all standard Haskell golf techniques.Perl — 184 个字符
(计数不包括所有换行符,这是可选的。)
此外,这个 155 个字符的程序不实现中间输出,而是将机器完全作为整个 FSM 定义的重复替换来执行(更改启动状态和输入)细绳)。它受到
sed
解决方案的启发,但并非源自该解决方案。通过将(?:...)
转换为(...)
并根据需要重新编号,可以将其缩短 2 个字符。Perl — 184 characters
(Count excluding all newlines, which are optional.)
Also, this 155-character program does not implement the intermediate outputs, but executes the machine entirely as a repeated substitution on the whole FSM definition (changing the start state and input string). It was inspired by, but not derived from, the
sed
solution. It could be shortened by 2 characters by converting the(?:...)
into a(...)
and renumbering as needed.Python 3,Chars:203
输出格式似乎有点难以适应。
Python 3, Chars: 203
The output format seems a bit hard to fit.
MIXAL 898 个字符
神化 Knuth!
MIXAL 898 characters
Deify Knuth!
Haskell - 189 个字符
编辑:未正确实现无转换拒绝的输出。
断行版本和变量指南:
我从MtnViewMark的解决方案中得到了
s<"["
技术;其余的是我自己的设计。显着特征:Haskell - 189 characters
EDIT: Does not correctly implement the output for no-transition rejection.
Line-broken version and variable guide:
I got the
s<"["
technique from MtnViewMark's solution; the rest is my own design. Notable characteristics:Common Lisp - 725
没有真正尝试最小化代码——Common Lisp 在所需的输入处理中付出了沉重的代价,所以我认为这个解决方案获胜的机会不大:-)
Common Lisp - 725
No real attempt to minimize the code -- Common Lisp pays a heavy penalty in the required input processing, so I don't think there's much chance of this solution winning :-)
Ruby — 183
真的,很奇怪的输出规范。这是我的工作原理: http://ideone.com/cxweL
Ruby — 183
Really, strange output specification. Here how my works: http://ideone.com/cxweL
Rexx 205 个字符
(这个答案经过了几次编辑,因为我最初只是出于普遍兴趣发布了一些代码,然后决定实际发布一个真正的解决方案)
这是一个 Rexx 版本,让人们体验一下这种鲜为人知的语言。 Rexx http://en.wikipedia.org/wiki/REXX 是 IBM 中使用的一种解释语言VM/CMS 大型机操作系统以及后来的 IBM OS/2(我相信有一个 Amiga 变体)。它是一种非常具有表现力的语言,也是一种令人惊叹的通用/“脚本”语言。
这可以使用 Regina Rexx 解释器运行。
使用其独特的输出来处理不正确的转换场景以及对大写字母的测试有点昂贵。
对于对 Rexx 语法感兴趣的人来说,下面是一些旧编辑的代码,这些代码并不 100% 符合输出要求,但功能齐全(此答案中的所有代码都适用于我在下面粘贴的示例,但上面的代码处理其他所需的角落):
旧的短版本:
长版本:
样本输入/输出:
Rexx 205 characters
(This answer went through few edits as I initially just posted some code for general interest and then decided to actually post a real solution)
Here's a Rexx version to give people a taste for that less known lanugage. Rexx http://en.wikipedia.org/wiki/REXX is an interpreted language used in IBM's VM/CMS mainframe operating system and later in IBM OS/2 (and I believe there was an Amiga variant). It's a very expressive language and an amazing general purpose/"scripting" language.
This can be run with the Regina Rexx interpreter.
Handling the incorrect transition scenario with its unique output and also testing for uppercase is a bit expensive.
Code from some older edits below for people interested in the Rexx syntax, those aren't 100% compliant with the output requirements but are functional (all code in this answer works with the samples I pasted below but the code above handles the other required corners):
Older short version:
Longer version:
Sample in/out:
Lua, 356
将任何非空格字符作为状态,将任何非空格字符作为过渡字母。虽然看起来不是最短,但我还是会以任何方式发布。
可以节省 25 个字符的打印制表符而不是空格。
可读版本:
高尔夫版本+输入输出。
Lua, 356
Takes any nonspace characters for states, and any non-space one characters for transition letters. Though it seems not shortest, I'll post it any way.
Could save 25 chars printing tabs instead of spaces.
Readable version:
Golfed version + in- an output.
bash -
186 185184 chars请注意,这实际上需要 bash - POSIX sh 没有关联数组或 C 风格的语法(并且可能也没有使用所有参数扩展,虽然我还没有检查过)。
编辑:或者,对于完全相同的长度,
bash -
186 185184 charsNote that this does actually require bash - POSIX sh doesn't have associative arrays or the C-style for syntax (and probably doesn't have all the parameter expansions used either, although I haven't checked).
Edit: alternatively, for the exact same length,
Python (2.6) ~ 269 个字符。
可能还有改进的空间,欢迎提示。
我认为处理规格。
Python (2.6) ~ 269 characters.
Probably still room for improvement, hints welcome.
Handles specifications I think.
Lua -
248227检查 键盘 上的运行版本
旧版本Lua -
248227check running version on codepad
old versionF# 420
我认为对于不可变的高尔夫来说还不错。不过我今天的课程成绩不太好。
未打高尔夫球的 F# 的 33 条线。打完高尔夫球后我会再次更新。
F# 420
Not bad for immutable golf I think. I didn't do very good on the course today though.
33 lines for un-golfed F#. I'll update again in a bit after I've golfed.
C# -
453375353345 个字符这不会获胜(并不是任何人都应该期望它获胜) ,但无论如何,写起来很有趣。为了便于阅读,我保留了前导空格和换行符:
在上次更新中,我通过假设输入行数的实际限制(即 99,999)能够节省 22 个字符。在最坏的情况下,您需要将其增加到 Int32 最大值 2,147,483,647,这将添加 5 个字符。不过,我的机器不喜欢这么长的数组的想法......
执行示例:
C# -
453375353345 charactersThis doesn't win (not that anyone should have expected it to), but it was fun to write anyway. I kept the leading spaces and newlines for legibility:
In my last update I was able to save 22 characters by assuming a practical limit to the number of input rows (namely 99,999). In the worst case, you'd need to up that to the Int32 max of 2,147,483,647 which would add 5 chars. My machine doesn't like the idea of an array that long though...
An example of the execution: