Stream 或 Iterator 来生成与正则表达式匹配的所有字符串?
这是我之前的问题的后续问题。
假设我想生成与给定(简化)正则表达式匹配的所有字符串。
这只是一个编码练习,我没有任何额外的要求(例如,实际上生成了多少个字符串)。所以主要的要求是生成漂亮、干净、简单的代码。
我考虑过使用 Stream
但在阅读这个问题 我正在考虑迭代器
。你会用什么?
This is a follow-up to my previous question.
Suppose I want to generate all strings that match a given (simplified) regular expression.
It is just a coding exercise and I do not have any additional requirements (e.g. how many strings are generated actually). So the main requirement is to produce nice, clean, and simple code.
I thought about using Stream
but after reading this question I am thinking about Iterator
. What would you use?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该问题的解决方案需要太多代码,因此无法在此处实际回答,但大纲如下。
首先,您想要解析正则表达式 - 例如,您可以为此研究解析器组合器。然后,您将拥有一个评估树,例如,
您可以通过定义生成器将其作为生成器运行,而不是将此表达式树作为匹配器运行每个术语的方法。对于某些术语(例如
ZeroOrOne(Constant("d"))
),会有多个选项,因此您可以定义一个迭代器。实现此目的的一种方法是在每个术语中存储内部状态并传入“提前”标志或“重置”标志。在“重置”时,生成器返回第一个可能的匹配项(例如""
);提前,它会转到下一个并返回该值(例如"d"
),同时消耗提前标志(留下其余的在没有标志的情况下进行评估)。如果没有更多的项目,它将对自身内部的所有内容进行重置,并为下一个项目保留完整的提前标志。你首先要进行重置运行;在每次迭代中,您都会提前输入,并在再次取出时停止。当然,一些正则表达式结构(例如
"d+"
)可以生成无限多个值,因此您可能希望以某种方式限制它们(或者在某些时候返回,例如d...d
意思是“很多”);其他的有很多可能的值(例如.
匹配任何字符,但您真的想要所有 64k 字符,还是有很多 unicode 代码点?),您可能也希望限制这些值。不管怎样,这虽然很耗时,但会产生一个可以工作的发电机。而且,顺便说一句,如果您为解析树的每个部分编写匹配例程,您还将拥有一个可用的正则表达式匹配器。
The solution to this question asks for too much code for it to be practical to answer here, but the outline goes as follows.
First, you want to parse your regular expression--you can look into parser combinators for this, for example. You'll then have an evaluation tree that looks like, for example,
Rather than running this expression tree as a matcher, you can run it as a generator by defining a generate method on each term. For some terms, (e.g.
ZeroOrOne(Constant("d"))
), there will be multiple options, so you can define an iterator. One way to do this is to store internal state in each term and pass in either an "advance" flag or a "reset" flag. On "reset", the generator returns the first possible match (e.g.""
); on advance, it goes to the next one and returns that (e.g."d"
) while consuming the advance flag (leaving the rest to evaluate with no flags). If there are no more items, it produces a reset instead for everything inside itself and leaves the advance flag intact for the next item. You start by running with a reset; on each iteration, you put an advance in, and stop when you get it out again.Of course, some regex constructs like
"d+"
can produce infinitely many values, so you'll probably want to limit them in some way (or at some point return e.g.d...d
meaning "lots"); and others have very many possible values (e.g..
matches any char, but do you really want all 64k chars, or howevermany unicode code points there are?), and you may wish to restrict those also.Anyway, this, though time-consuming, will result in a working generator. And, as an aside, you'll also have a working regex matcher, if you write a match routine for each piece of the parsed tree.