使用有限状态机对于一般文本解析来说是一个好的设计吗?
我正在读取一个已满的文件 与十六进制数字。我必须确定一个 特定模式,说“aaad”(不带引号)来自 它。每次看到这个图案我都会 生成一些数据到其他文件。
这在程序设计中是非常常见的情况 - 解析并寻找特定模式。
我将其设计为有限状态机,并使用在C中对其进行结构化>switch-case
来改变状态。这是我想到的第一个实现。
- 设计:是否有更好的设计可能?
- 实施:正如我提到的,您是否发现使用开关盒存在一些问题?
I am reading a file that is filled
with hex numbers. I have to identify a
particular pattern, say "aaad" (without quotes) from
it. Every time I see the pattern, I
generate some data to some other file.
This would be a very common case in designing programs - parsing and looking for a particular pattern.
I have designed it as a Finite State Machine and structured structured it in C using switch-case
to change states. This was the first implementation that occured to me.
- DESIGN: Are there some better designs possible?
- IMPLEMENTATION: Do you see some problems with using a switch case as I mentioned?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
手卷 FSM 可以很好地适用于简单的情况,但随着状态和输入数量的增加,它们往往会变得笨拙。
可能没有理由改变您已经设计/实现的内容,但如果您对通用文本解析技术感兴趣,您可能应该看看正则表达式、Flex、Bison 和 ANTLR 之类的东西。
A hand-rolled FSM can work well for simple situations, but they tend to get unwieldy as the number of states and inputs grows.
There is probably no reason to change what you have already designed/implemented, but if you are interested in general-purpose text parsing techniques, you should probably look at things like regular expressions, Flex, Bison, and ANTLR.
对于极其简单的情况,几个
if
或switch
就足够了。要在 POSIX 系统上解析字符串,man regex (3)。要对整个文件(例如复杂的配置)进行全功能解析,请使用 Lex/Flex 和 Yacc/Bison。
使用 C++ 编写时,请查看 Boost Regex 了解更简单的情况,Boost Spirit 对于更复杂的一个。柔性和Bison 也可以使用 C++。
For embarrassingly simple cases couple of
if
's orswitch
'es are sufficient.For parsing a string on POSIX systems, man regex (3). For fully featured parsing of whole files (e.g. complex configs) use Lex/Flex and Yacc/Bison.
When writing in C++, look at Boost Regex for the simpler case and Boost Spirit for more complex one. Flex & Bison work with C++ too.