广义序列模式算法MapReduce
我正在寻找通用序列模式算法(GSP)的示例实现 http://en.wikipedia.org /wiki/GSP_Algorithm
虽然维基百科文章提供了伪代码,但它有点令人困惑,我希望看到一些正确的代码(最好是 python 或 java)。有谁知道一个好的参考吗?
我想首先了解该算法,然后可能使其在 MapReduce 世界中工作 - 正如维基百科文章所示,我认为计数器的使用可能很复杂。
我这样做是因为我有一个事件图,其中边缘受时间约束,序列将是一个节点连接到另一个节点,其中 A -> 。 B发生在开始时间和结束时间之间并且B→> C 在 B 在第一个连接中完成后 X 次发生。 A-> B-> C 是序列,序列不能多次重新访问一个节点。
I am looking for an example implementation of the Generalized Sequential Pattern algorithm (GSP) http://en.wikipedia.org/wiki/GSP_Algorithm
Whilst the Wikipedia article provides psuedo code, its a bit confusing and I would like to see some proper code (ideally python or java). Does anyone know a good reference?
I want to understand the algorithm first and then potentially make it work in a MapReduce world - which as the wikipedia article shows the use of counters I think could be complex.
I am doing this because I have a graph of events where the edges are constrained by time, a sequence would be where a node is connected to another node where A -> B happens between a start and a finish time and B -> C happens X time after B finishes in the first connection. A -> B -> C would be the sequence, a sequence can't revisit a node more than once.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
SPMF是一个很好的工具,它实现了很多算法。
它可以帮助我们节省大量时间。
但是我们需要比较不同算法的性能,例如广义序列模式(GSP),它是序列模式挖掘中的重要算法。
SPMF is a good tool, which implements many algorithms.
It can help us to save a lot of time.
But we need to compare the performance on different algorithms such like Generalized Sequential Patterns (GSP), which is an important algorithm in sequential pattern mining.
如果您想要一些 GSP、PrefixSpan、SPADE、SPAM 等的 Java 代码,请查看此网站:http://www.philippe-fournier-viger.com/spmf/
然后你可以检查是否可以将它们改编成map-reduce算法。
If you want some Java code for GSP, PrefixSpan, SPADE, SPAM and many others, check this website: http://www.philippe-fournier-viger.com/spmf/
Then you could check if you may adapt them into a map-reduce algorithm.