字符串/序列模式挖掘
这一周我一直在努力寻找问题的答案,如果有人能提供帮助,我将不胜感激。 我有一个字符串列表(最初是序列列表,可以被视为字符串列表),我想在这个列表的字符串中找到一个模式(它本身就是一个字符串),是否有任何java库可以我可以使用或者有任何工具(例如 weka ,它不这样做!)可以帮助我吗?
it's a week i'm trying to find an answer for my question , i would appreciate if anyone can help .
I've got a list of strings(originally list of sequences which can be viewed as list of strings) and i'd like to find a pattern (which is a string itself) withtin strings of this list , is there any java library which can i use or is there any tool (like weka , which doesn't do this!) which can help me ??
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
听起来您想找到这些字符串的最长公共子序列。这是一个众所周知的算法问题,通常使用动态规划来解决。请参阅此处了解多种语言的各种实现。
Sounds like you want to find the longest common subsequence of those strings. This is a well known algorithmic problem that is commonly solved using dynamic programming. See here for various implementations in multiple languages.
如果你想找到一组序列中频繁出现的模式,那么你可以尝试“序列模式挖掘”或“序列规则挖掘算法”。
在我的 SPMF Java 开源数据挖掘库 中,有这些算法的多种实现。
If you want to find patterns frequently occuring in a set of sequence, then you could try "sequential pattern mining" or "sequential rule mining algorithms".
There are several implementations of these algorithms in my SPMF Java open-source data mining library.