共享迭代器
我正在 Scala 中编写一个(简单的)编译器,并使标记器可迭代,现在需要编写解析器。该计划是使用递归下降策略,因此解析器将分为多个方法,每个方法都调用(其中一些)其他方法。
我认为维护分词器迭代器的状态并在各种方法之间共享它是必要/最好的。是这样吗?我该怎么办?如果不是这种情况,还有哪些替代方案?
I'm writing a (simple) compiler in Scala and have made the tokenizer iterable and now need to write the parser. The plan is to use a recursive-descent strategy and so the parser is going to be split up into a number of methods, each of which calls (some of) the others.
I assume it's going to be necessary/preferable to maintain the state of the tokenizer iterator and share it among the various methods. Is this the case? How should I go about it? If it's not the case, what are the alternatives?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果必须维护迭代器的状态,请不要使用迭代器!迭代器适用于您可以随时销毁状态的情况。
您可能可以不用使用流。流有一个习惯,就是在应该放弃内存的时候不放弃内存,因为引用会保留在您不想要的地方(但如果您仔细想想,您可以知道它们存在)。因此,如果您从迭代器开始,您可以 .toStream 它并传入子流,然后传递流以进行进一步处理。但是,如果您想避免将所有内容都保留在内存中,则必须非常小心,不要保留对流头部的引用。
另一种方法是将所有内容转储到向量或数组中,并将整个问题保留在内存中;然后,您可以在继续时删除不相关的部分(或推进索引)。
最后,如果您绝对肯定不需要任何回溯,那么您可以直接使用迭代器,而不必担心“维护状态”。也就是说,当您从子方法返回时,您将已经消耗了正确的令牌,并且不再消耗更多令牌,并且可以自由地继续解析。为了在返回值上至少没有一个元素“我没有使用的下一个令牌”的情况下,您需要能够预测最后一个令牌的位置(例如,无限长度的列表必须以属于列表一部分的标记,因此
{1,2,3}
可能是一个列表(如果您在看到{
时进入列表处理,并在看到{
时退出)你点击了}
),但没有1,2,3 + 7
(因为在您意识到列表已结束之前您会消耗+
))。If you have to maintain the state of the iterator, don't use an iterator! Iterators are for when you can destroy your state as you go.
You might be able to get away with using a stream. Streams have a habit of not giving up their memory when they ought to because of references persisting where you don't want them (but where you can tell they exist if you think about it). So if you started with an iterator, you could .toStream it and pass the substreams in, and then pass on the stream for further processing. But you'd have to be very careful about not keeping a reference to the head of the stream if you wanted to avoid keeping everything in memory.
Another way to go is to just dump everything into a vector or array and keep the whole problem in memory; you can then drop the irrelevant parts (or advance the index) as you proceed.
Finally, if you're absolutely positive that you don't need any backtracking, then you can just use the iterator as it is without worrying about "maintaining the state". That is, when you get back from the sub-method, you will already have consumed exactly the right tokens and no more, and will be free to keep parsing. For this to work without at least a one-element "next token that I didn't consume" on the return value, you need to be able to predict where the last token is (e.g. a list of unbounded length would have to end with a token that was part of the list, so
{1,2,3}
could be a list (if you go into list processing when you see{
and drop out when you hit}
), but not1,2,3 + 7
(because you'd consume+
before you realized that the list was over)).您可以构造令牌迭代器并将其传递给每个递归解析器调用,以便令牌级解析从中读取:
You could just construct the token iterator and pass it down each recursive parser call so that the token-level parsing reads from it: