状态解析的替代方案
我不喜欢状态解析。在我看来应该有更好的方法。 有吗?
让我举例说明。假设我正在解析一个文本文件(本例中为 YAML,但它可以是纯文本或 XML。我正在制作一个简单的问答游戏;我的游戏将包含 组
问题
,每个问题有两个或多个答案
。在 YAML 中,我可能会像这样构建我的文件:(
set:
name: math questions
question:
text: 1 + 1 = ?
answer: 3
answer: 4
best-answer: 2
question:
text: 2 * 3 = ?
answer: 5
best-answer: 6
set:
name: chemistry questions
question:
text: the valence of a Chlorine free radical is?
answer: 1
answer: 0
best-answer: -1
question:
text: Xeon is a noble gas
best-answer: true
answer: false
我有一段时间没有使用 YAML,如果它是伪 YAML,我深表歉意) .) 当我解析,如果我阅读当前行并看到“答案:...”,我必须知道我正在回答一组问题,
这往往是非常有状态的代码,例如:
if (currentLine starts with "answer")
currentQuestion.addAnswer(...)
else if (currentLine starts with "question")
currentQuestion = new question
...
在任何时候。在解析过程中,我们需要对当前对象的引用,该对象可能嵌套在其他几个对象中,
部分问题可能是我的主循环逐行迭代。另一种方法。可能只是阅读该行,具体取决于它的内容是,根据需要再阅读几行。
所以,我的问题是:是否有一种无状态的方法来解析数据?我有一种感觉,可能存在一种更清晰、更容易阅读的方法/比我通常对所有文本行进行有状态 for 循环的理解/编码。
I don't like stateful parsing. It seems to me there should be a better approach. Is there?
Let me illustrate by example. Let's say I'm parsing a text file (YAML in this case, but it could be plain-text or XML. I'm making a simple trivia game; my game will contain set
s of question
s, each with two or more answer
s. In YAML, I might structure my file like:
set:
name: math questions
question:
text: 1 + 1 = ?
answer: 3
answer: 4
best-answer: 2
question:
text: 2 * 3 = ?
answer: 5
best-answer: 6
set:
name: chemistry questions
question:
text: the valence of a Chlorine free radical is?
answer: 1
answer: 0
best-answer: -1
question:
text: Xeon is a noble gas
best-answer: true
answer: false
(I haven't used YAML in a while, apologies if it's pseudo-YAML.) When I'm parsing, if I read the current line and see "answer: ...," I have to know that I'm in the answer of a question of a set.
This tends to be very stateful code, like:
if (currentLine starts with "answer")
currentQuestion.addAnswer(...)
else if (currentLine starts with "question")
currentQuestion = new question
...
At any point in the parsing, we need a reference to the current object, which may be nested within several other objects.
Part of the problem might be that my main loop iterates over each line, line by line. An alternative approach might be to just read the line, and depending on what it is, read several more lines as necessary.
So again, my question: is there a stateless way to parse data? I have a feeling that an approach might exist that would be more clear and easier to read/understand/code than my usual stateful for-loops over all lines of text.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
显然,您并不是在寻找“无状态”解析,而是在寻找非命令式、纯函数式解析。当然,总会有某种状态,但使用函数方法时,您的状态是完全隐式的。
查看“函数式珍珠:Haskell 中的单子解析” 文章,并查看各种类似 Parsec 的解析组合器库,这些库甚至适用于 Java 和 C++ 等非常命令式的语言。
You're apparently looking not for a "stateless" parsing, but for a non-imperative, pure functional parsing. Of course there is always a state of a sort, but with a functional approach your state is entirely implicit.
Take a look at "Functional pearls: monadic parsing in Haskell" article, and check out various Parsec-like parsing combinators libraries, which exist for even so very imperative languages like Java and C++.
您所描述的是一种或多或少的状态机驱动的解析方法:您迭代文件的行,并且状态变量跟踪您在解析树中的位置。您可能会发现使用递归下降解析更容易、更清晰,其中大部分状态都是隐式的,以程序堆栈的形式。正如其他人指出的那样,解析本质上是有状态的,但递归下降可以让您显式地保留更少的状态。
What you describe is a more or less state machine driven parsing approach: you iterate over lines of the file, and a state variable keeps track of where in the parse tree you are. You might find it easier and cleaner to use recursive descent parsing, in which much of the state is implicit, in the form of the program stack. As others point out, parsing is inherently stateful, but recursive descent lets you keep less state explicitly.
您刚才描述了“给定某种状态,做某事”。也就是说,有状态的方法。
解析本质上是有状态的。数据的含义取决于上下文。上下文就是状态。
编译器入门课程从有限状态机开始是有原因的。
You just described "given a certain state, do something." That is, a stateful approach.
Parsing is inherently stateful. The meaning of the data depends on the context. The context is the state.
There's a reason that an introductory course in compilers starts with finite-state machines.
解析的概念本身意味着某些片段是一种令牌类型,其他片段是另一种类型,而其他片段则根本无效。你怎么知道如果不保持某种状态说“好吧,我现在正在解析 foo...这就是我应该在这里拥有的”?
The very concept of parsing implies that some pieces are one type of token, others are another, and others aren't valid at all. How are you going to know that without maintaining some kind of state that says "ok, i'm parsing a foo right now...this is what i should have here"?