是否可以在mysql语句中运行apriori关联规则?
数据库:
Transacation# Items List
T1 butter
T1 jam
T2 butter
T3 bread
T3 ice cream
T4 butter
T4 jam
上表中, 是否可以在mysql语句中运行apriori关联规则?
例如,buys(T, Butter) 的支持 --> buys(T, jam) = 50%
因为有 4 笔交易且 T1、T4 满足“支持”规则。
我可以只使用一条sql语句来找出这样的结果吗?
Database:
Transacation# Items List
T1 butter
T1 jam
T2 butter
T3 bread
T3 ice cream
T4 butter
T4 jam
In the above table,
Is it possible to run apriori association rule in mysql statement?
For example, the support of buys(T, butter) --> buys(T, jam) = 50%
because there are 4 transactions and T1, T4 satisfy "support" rule.
Can i just use a sql statement to find out such result?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,你可以使用SQL来查找单个项目的支持。但如果你想找到包含多个项目的项目集,那就很困难了。
例如,如果您的交易包含多个项目,并且您希望找到“果酱”与“牛奶”和“面包”一起出现的支持,那么最好使用像 Apriori 这样的算法,或者像 FPGrowth 这样更快的算法。
Yes, you can use SQL to find the support of a single item. But if you want to find itemsets containing more than one item, it would be difficult.
For example, if you had transactions containing several items and you want to find the support of "jam" with "milk" and "bread" appearing together, then it is better to use an algorithm like Apriori, or a faster algorithm like FPGrowth.
你给的样本数据我得出了66%?
有 3 笔交易涉及“黄油”,其中只有 2 笔交易涉及“果酱”。
我使用了以下测试表。
以下是我尝试的答案。内部选择查找包含“黄油”的所有交易。对于每个这样的交易,它还设置一个标志(bought_jam),说明该交易是否也包含“jam”。 (having 子句排除包含“jam”但不包含“butter”的交易)。
在外部选择中,我基本上对所有行进行计数(计数对应于包括黄油的事务数),并对果酱标志进行求和,该标志对应于包括黄油和果酱的事务数。
上面的查询给出以下结果:
让我知道这对您来说效果如何。
编辑:
以下查询将给出相同的结果,但更容易阅读。但是,如果事务表非常大,并且
item = x
选择性不强(返回大量行),则此查询几乎肯定会变慢。I arrive at 66% for the sample data you gave?
There are 3 transactions for "butter", and out of those only 2 included "jam".
I used the following test table.
The following is my attempt at an answer. The inner select find all transactions that included "butter". For each such transaction, it also sets a flag (bought_jam) saying whether that transaction also included "jam". (The having clause excludes transactions that includes "jam" but not "butter").
In the outer select, I basically count all rows (the count corresponds to the number of transactions including butter), and sums the jam flag, which corresponds to the number of transactions including both butter and jam.
The query above gives the following result:
Let me know how this works out for you.
Edit:
The following query would give the same results, but is much easier to read. However, if the transactions table is very large, and the
item = x
isn't very selective (returns lots of rows), this query would almost certainly be slower.