解析:库函数、FSM、explode() 还是 lex/yacc?
当我必须解析文本(例如配置文件或其他相当简单/描述性语言)时,我想到了几种解决方案:
- 使用库函数,例如
strtok()
、sscanf( )
- 一种有限状态机,一次处理一个字符,
- 使用我曾经出于纯粹无聊而使用
lex
/ 编写的explode()
函数进行标记和解析yacc
(阅读:flex
/bison
)生成适当的解析器
我不喜欢“库函数”方法。感觉很笨拙和别扭。 explode()
虽然不需要太多新代码,但感觉更加爆炸。而 flex
/bison
通常看起来完全是矫枉过正。
我通常会实现 FSM,但同时我已经为这个可怜的家伙感到抱歉,他可能需要在以后维护我的代码。
因此我的问题是:
解析相对简单的文本文件的最佳方法是什么?
这有关系吗?
是否有一个共同商定的方法?
When I have to parse text (e.g. config files or other rather simple/descriptive languages), there are several solutions that come to my mind:
- using library functions, e.g.
strtok()
,sscanf()
- a finite state machine which processes one char at a time, tokenizing and parsing
- using the
explode()
function I once wrote out of pure boredom - using
lex
/yacc
(read:flex
/bison
) to generate an appropriate parser
I don't like the "library functions" approach. It feels clumsy and awkward. explode()
, while it doesn't take much new code, feels even more blown up. And flex
/bison
often seems like sheer overkill.
I usually implement a FSM, but at the same time I already feel sorry for the poor guy that may have to maintain my code at a later point.
Hence my question:
What is the best way to parse relatively simple text files?
Does it matter at all?
Is there a commonly agreed-upon approach?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我会稍微违反一下规则,不按顺序回答你的问题。
绝对不是。恕我直言,您选择的解决方案应该取决于(仅举几例)您的文本、您的时间范围、您的经验,甚至您的个性。如果文本足够简单,足以让
flex
和bison
杀伤力过大,那么也许 C 本身就是杀伤力过大。快速重要还是稳健更重要?它是否需要维护,或者它启动得又快又脏?您是一位热情的 C 用户,还是会被正确的语言功能所吸引? &c.,&c.再说一次,这只有你才能回答。如果您与一群具有特定技能和能力的人密切合作,并且解析器很重要并且需要维护,那么它确实很重要!如果你“纯粹出于无聊”而写一些东西,我建议这根本不重要,不。 :-)
好吧,我不知道你会喜欢我的回答。也许首先在这里阅读其他一些很好的答案。
不,真的,继续吧。我会等待。
啊,你回来了,放松了。让我们轻松一点,好吗?
如果您用 C 语言编写它,但 C 感觉像是错误的工具……它确实可能是错误的工具。
awk
或perl
可能会完成您想要做的事情,而不会造成任何麻烦。您甚至可以使用cut
或类似的方法来完成此操作。另一方面,如果你用 C 语言编写它,你可能有充分的理由用 C 语言编写它。也许你的解析器是一个更大系统的一小部分,为了论证,它是嵌入的,在冰箱里,在月球上。或者也许您喜欢 C。您甚至可能讨厌
awk
和perl
,天哪。如果您不讨厌
awk
和perl
,您可能希望将它们嵌入到您的 C 程序中。原则上这是可行的——我自己从来没有这样做过。对于awk
,请尝试libmawk
。对于perl
,可能有几种方法(TMTOWTDI)。您可以使用popen
单独运行perl
来启动它,或者您也可以将 Perl 解释器实际嵌入到您的 C 程序中 - 请参阅man perlembed
。无论如何,正如我所说,“最好的解析方法”完全取决于您和您的团队、问题空间以及您解决问题的方法。我能提供的是我的意见。
我假设在您的纯 C 解决方案(库函数和 FSM(考虑您的
explode
本质上是一个库函数))中,您已经尽力隔离相关代码,设计好代码和文件等等。即便如此,我还是会推荐
lex
和yacc
。图书馆的功能让人感觉“笨拙和别扭”。状态机似乎难以维护。但你说 lex 和 yacc 感觉有点矫枉过正。
我认为你应该以不同的方式处理你的投诉。你真正要做的是指定一个 FSM。但是,您还聘请了某人为您编写和维护它,从而解决了大部分可维护性问题。矫枉过正?我有没有提到他们将免费工作?
我怀疑,但不知道,
lex
和yacc
最初感觉有点矫枉过正的原因是你的配置/简单文件感觉太简单了。如果我是对的(一个很大的如果),您也许可以在词法分析器中完成大部分工作。 (甚至可以想象,您可以在词法分析器中完成所有工作,但我对您的输入一无所知。)如果您的输入不仅简单而且广泛,您也许能够找到一个免费的词法分析器/解析器组合,用于什么目的你需要。简而言之:如果你不能用 C 语言做到这一点,请尝试其他方法。如果您需要 C,请使用
lex
和yacc
——它们有一点开销,但它们是一个非常好的解决方案。I'm going to break the rules a bit and answer your questions out of order.
Absolutely not. IMHO the solution you choose should depend on (to name a few) your text, your timeframe, your experience, even your personality. If the text is simple enough to make
flex
andbison
overkill, maybe C is itself overkill. Is it more important to be fast, or robust? Does it need to be maintained, or can it start quick and dirty? Are you a passionate C user, or can you be enticed away with the right language features? &c., &c.Again, this is something only you can answer. If you're working closely with a team of people, with particular skills and abilities, and the parser is important and needs to be maintained, it sure does matter! If you're writing something "out of pure boredom," I would suggest that it doesn't matter at all, no. :-)
Well, I don't know that you're going to like my answer. Maybe first read some of the other fine answers here.
No, really, go ahead. I'll wait.
Ah, you're back and relaxed. Let's ease into things, shall we?
If you're writing it in C, but C feels like the wrong tool...it really might be the wrong tool.
awk
orperl
will likely do what you're trying to do without all the aggravation. You may even be able to do it withcut
or something similar.On the other hand, if you're writing it in C, you probably have a good reason to write it in C. Maybe your parser is a tiny part of a much larger system, which, for the sake of argument, is embedded, in a refrigerator, on the moon. Or maybe you loooove C. You may even hate
awk
andperl
, heaven forfend.If you don't hate
awk
andperl
, you may want to embed them into your C program. This is doable, in principle--I've never done it myself. Forawk
, trylibmawk
. Forperl
, there are proably a few ways (TMTOWTDI). You can runperl
separately usingpopen
to start it, or you can actually embed a Perl interpreter into your C program--seeman perlembed
.Anyhow, as I've said, "the best way to parse" entirely depends on you and your team, the problem space, and your approach to the issue. What I can offer is my opinion.
I'm going to assume that in your C-only solutions (library functions and FSM (considering your
explode
to essentially be a library function)) you've already done your best at isolating the relevant code, designing the code and files well, and so forth.Even so, I'm going to recommend
lex
andyacc
.Library functions feel "clumsy and awkward." A state machine seems unmaintainable. But you say that
lex
andyacc
feel like overkill.I think you should approach your complaints differently. What you're really doing is specifying a FSM. However, you're also hiring someone to write and maintain it for you, thereby solving most of the maintainability problem. Overkill? Did I mention they'll work for free?
I suspect, but do not know, that the reason
lex
andyacc
originally felt like overkill was that your config / simple files just felt too, well, simple. If I'm right (a big if), you may be able to do most of your work in the lexer. (It's even conceivable that you can do all of your work in the lexer, but I know nothing about your input.) If your input is not only simple but widespread, you may be able to find a lexer/parser combination freely available for what you need.In short: if you can do this not in C, try something else. If you want C, use
lex
andyacc
--they have a little overhead, but they're a very good solution.如果你能让它工作,我会选择 FSM,但需要 Perl 兼容的正则表达式 的大力帮助。这个库很容易理解,你应该能够削减足够的无关意大利面,为你的怪物提供所有飞行怪物都渴望的空气动力学天赋。加上结构良好的意大利面条中的大量注释,应该会让您的代码维护继任者感到舒适。 (而且,我相信您知道,六个月后,代码维护的继任者就是您,当您转向其他事情并且您已经忘记了此代码的详细信息时。)
If you can get it to work, I'd go with an FSM, but with a huge assist from Perl-compatible regular expressions. This library is easy to understand, and you ought to be able to trim back sufficient extraneous spaghetti to give your monster that aerodynamic flair to which all flying monsters aspire. That, and plenty of comments in well-structured spaghetti, ought to make your code-maintaining successor comfortable. (And, as I'm sure you know, that code-maintaining successor is you after six months, when you've moved on to something else and the details of this code have slipped your mind.)
我的简短回答是也使用权利来解决问题。如果您有配置文件,请使用现有标准和格式,例如 ini 文件并使用 增强程序选项。
如果您进入“自己的”语言的世界,请使用 lex/yacc,因为它们为您提供了所需的功能,但您必须考虑维护语法和语言实现的成本。
因此,我建议进一步缩小问题范围以找到合适的工具。
My short answer is to use the right too for the problem. If you have configuration files use existing standards and formats e.g. ini Files and parse them using Boost program_options.
If you enter the world of "own" languages use
lex/yacc
, since they provide you with the required features, but you have to consider the cost of maintaining the grammar and language implementation.As a result I would recommend to further narrow you problem scope to find the right tool.