Apriori 算法:具有频繁 (k-1) 子集意味着频繁?
我一直盯着一本描述挖掘频繁项集的 Apriori 算法的书里的下面一行,我似乎无法理解它
请注意,给定一个候选 k-项集,我们只需要检查它是否 (k- 1)-子集是频繁的,因为Apriori算法使用逐级搜索策略。
在上面,候选意味着是潜在的频繁k项集。
很明显,频繁 k 项集的 (k-1) 子集是频繁的,但即使所有 (k-1) 子集都是频繁的,我也没有看到其他含义。但也许我的阅读方式是错误的?
I have been staring at the following line in a book describing the Apriori algorithm for mining frequent itemsets, and I can not seem to grasp it
Note that given a candidate k-itemset, we only need to check if its (k-1)-subsets are frequent since the Apriori algorithm uses a level-wise search strategy.
In the above, candidate means being a potential frequent k-itemset.
It is clear that the (k-1)-subsets of frequent k-itemset are frequent, but I don't see the other implication even with all the (k-1)-subsets being frequent. But perhaps I am reading in a wrong way ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
“很明显,频繁 k 项集的 (k-1) 子集是频繁的,但即使所有 (k-1) 子集都是频繁的,我也没有看到其他含义。”
你是对的,另一个含义是不正确的。 (k-1) 子集用于生成您需要测试频繁性或支持度(如原始论文所称)的 k 项集。您需要测试对从 (k-1) 子集生成的 k 项集的支持。
原始论文非常易读,可以在此处获取。第 4 页第 1 列有一个示例,使这里的想法非常清晰。
"It is clear that the (k-1)-subsets of frequent k-itemset are frequent, but I don't see the other implication even with all the (k-1)-subsets being frequent."
You are right, the other implication is not true. The (k-1) subsets are used to generate the k-itemsets you need to test for frequentness, or support (as the original paper calls it). You need to test for support for the k-itemsets generated from the (k-1)-subsets.
The original paper is quite readable and available here. Page 4 column 1 has an example which makes the idea here quite clear.
另一个含义是不正确的。但如果一个子集不频繁,则该项集也不会频繁。 APriori算法执行子集检查以消除一些不频繁的项集。但在此之后,还需要检查每个候选人的支持度。为此,Apriori 算法将扫描数据库。
如果你想更好地描述 Apriori,我建议查看这本书的章节:
http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf
它用非常简单的术语解释了 Apriori、FPGrowth 和关联规则挖掘。它比原始的 Apriori 文章更容易阅读。
The other implication is not true. But if one subset is not frequent then the itemset will not be frequent. The APriori algorithm perform the subset check to eliminate some infrequent itemset. But after that, it still need to check the support of each candidate. To do that, the Apriori algorithm will scan the database.
If you want a better description of Apriori, I suggest to check this book chapter:
http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf
It explains Apriori, FPGrowth and association rule mining in very simple terms. It is easier to read than the original Apriori article.