如何检测序列与模式的偏差?
我想检测一系列动作偏离给定模式的段落,并且无法找出一个聪明的解决方案来做到这一点,尽管问题听起来非常简单。
该模式的目标是以某种方式描述正常序列。更具体地说:“动作序列中应该或不应该包含哪些动作以及按什么顺序?”然后我想将动作序列与模式进行匹配并检测偏差及其位置。
我的第一个方法是使用正则表达式来做到这一点。下面是一个示例:
Example 1:
Pattern: A.*BC
Sequence: AGDBC (matches)
Sequence: AGEDC (does not match)
Example 2:
Pattern: ABCD
Sequence: ABD (does not match)
Sequence: ABED (does not match)
Sequence: ABCED (does not match)
Example 3:
Pattern: ABCDEF
Sequence: ABXDXF (does not match)
使用正则表达式可以很容易地检测到错误,但无法检测到错误发生的位置。我的方法是连续删除最后一个正则表达式块,直到我可以在序列中找到模式。然后我就会知道最后的正确动作,并且至少找到了第一个偏差。但这对我来说似乎不是最好的解决方案。此外,我不能所有的偏差。
我脑海中的其他想法是使用状态机、像 ANTLR 这样的订单工具。但我不知道他们是否能解决我的问题。 我想检测遗漏和委托的错误,并为用户提供创建自己的模式的可能性。你知道有什么好方法吗?
i want to detect the passages where a sequence of action deviates from a given pattern and can't figure out a clever solution to do this, although the problem sound really simple.
The objective of the pattern is to describe somehow a normal sequence. More specific: "What actions should or shouldn't be contained in an action sequence and in which order?" Then i want to match the action sequence against the pattern and detect deviations and their locations.
My first approach was to do this with regular expressions. Here is an example:
Example 1:
Pattern: A.*BC
Sequence: AGDBC (matches)
Sequence: AGEDC (does not match)
Example 2:
Pattern: ABCD
Sequence: ABD (does not match)
Sequence: ABED (does not match)
Sequence: ABCED (does not match)
Example 3:
Pattern: ABCDEF
Sequence: ABXDXF (does not match)
With regular expressions it is simple to detect a error but not where it occurs. My approach was to successively remove the last regex block until I can find the pattern in the sequence. Then i will know the last correct actions and have at least found the first deviation. But this doesn't seem the best solution to me. Further i can't all deviations.
Other soultions in my mind are working with state machines, order tools like ANTLR. But I don't know if they can solve my problem.
I want to detect errors of omission and commission and give an user the possibility to create his own pattern. Do you know a good way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在匹配您的输入时,正则表达式引擎会提供有关不匹配位置的信息 - 但它可能无法以您可以轻松访问的方式提供该信息。
考虑例如实现该表达式的 DFA。它按顺序获取字符,将它们与期望进行匹配,并且您对序列中没有有效匹配的点感兴趣。
其他实现可能会来回变化,您可能会对所获取的任何字符的最大地址感兴趣。
在 Java 中,这可以通过提供 CharSequence 实现来完成,
访问器方法在其中跟踪最大索引。
不过我还没有尝试过。它也不能帮助您对错误进行分类。
While matching your input, a regular expression engine has information on the location of a mismatch -- it may however not provide it in a way where you can easily access it.
Consider e.g. a DFA implementing the expression. It fetches characters sequentially, matching them with expectation, and you are interested at the point in the sequence where there is no valid match.
Other implementations may go back and forth, and you would be interested in the maximum address of any character that was fetched.
In Java, this could possibly be done by supplying a CharSequence implementation to
where the accessor methods keep track of the maximum index.
I have not tried that, though. And it does not help you categorizing an error, either.
你看过马尔可夫链吗? http://en.wikipedia.org/wiki/Markov_chain - 听起来你想要意外的转换。也许还有隐藏的马尔可夫模型 http://en.wikipedia.org/wiki/Hidden_Markov_Models
have you looked at markov chains? http://en.wikipedia.org/wiki/Markov_chain - it sounds like you want unexpected transitions. maybe also hidden markov models http://en.wikipedia.org/wiki/Hidden_Markov_Models
找到正则表达式的开源实现,并添加一个钩子,以在给定的比较不匹配时返回/设置/打印/保存失败的索引。或者,编写您自己的 RE 引擎(不适合胆小的人),这样它就会完全满足您的要求!
Find an open-source implementation of regexs, and add in a hook to return/set/print/save the index at which it fails, if a given comparison does not match. Alternately, write your own RE engine (not for the faint of heart) so it'll do exactly what you want!