从输入导出最小正则表达式
我有一个远程“代理”,在收到字符串时返回“是”或“否”。与此代理通信的成本很高,因此我希望找到一个库,它允许我在给定正反馈和负反馈的情况下迭代构建正则表达式,同时对其构建进行智能处理。这将允许我在发送端缓存答案。
例如,假设我们用“good”查询代理并收到“yes”。最初派生的正则表达式应该是“好”。
假设我用“goop”查询并收到“yes”。我希望派生的正则表达式是“goo[dp]”,而不是“good|goop”。
等等。
我的派生正则表达式中不需要回溯或任何其他奇特的非线性时间操作。据推测,生成的正则表达式将是底层的 DFA。有谁知道任何 c/c++ 正则表达式库能够执行此操作吗?另外,为什么这是一个愚蠢的想法以及对我的实际问题的更好解决方案的原因也将很有用。
I have a remote "agent" that returns "yes" or "no" when handed a string. Communicating with this agent is expensive, so I'm hoping to find a library that will allow me to iteratively build a regular expression given positive and negative feedback, while being intelligent about its construction. This would allow me to cache answers on the sending side.
For example, suppose we query the agent with "good" and receive a "yes". The initial derived regular expression ought to be "good".
Suppose I query then with "goop" and receive a "yes". I would expect the derived regular expression to be "goo[dp]", not "good|goop".
And so forth.
I do not need backtracking or any other fancy non-linear time operations in my derived regex. Presumably the generated regex would be a DFA under the hood. Is anyone aware of any c/c++ regular expression libraries capable of doing this? Alternatively, reasons why this is a dumb idea and better solutions to my real problem would also be useful.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用 Trie,而不是正则表达式。
然后,对于每个新字符串,您将遍历 trie 中每个字符的一个节点。我怀疑您还需要一个字符串结尾的标记字符 - 一旦到达该字符,如果节点存在,它就会保存是/否答案。
Rather than a regular expression, you could use a Trie.
Then for each new string you walk the trie one node for each character. I suspect that you would also want a marker character for end of string - once you reach this character, if the node exists, it holds the yes/no answer.
好吧,除非我在你的情况下遗漏了一些东西,否则我认为内存足够便宜,可以直接实现一个哑缓存 - 比如说,
的 unordered_map 。这不仅会更容易构建,而且可能会更快,因为您正在构建哈希映射。唯一的缺点是,如果您要使用无数不同的密钥查询远程服务,那么这可能不是最好的方法。Well, unless I'm missing something in your situation, I think that memory is cheap enough to just straight up implement a dumb cache - say, an unordered_map of
<std::string, bool>
. Not only will this be much easier to build, it will probably be faster too, since you're building a hash map. The only downside to this is if you were going to query the remote service with a bazillion different keys, then this might not be the best approach.