是否可以在mysql语句中运行apriori关联规则?

发布于 2024-10-20 13:54:44 字数 386 浏览 5 评论 0原文

数据库:

Transacation#   Items List
T1              butter
T1              jam
T2              butter
T3              bread
T3              ice cream
T4              butter
T4              jam

上表中, 是否可以在mysql语句中运行apriori关联规则?

例如,buys(T, Butter) 的支持 --> buys(T, jam) = 50%

因为有 4 笔交易且 T1、T4 满足“支持”规则。

我可以只使用一条sql语句来找出这样的结果吗?

Database:

Transacation#   Items List
T1              butter
T1              jam
T2              butter
T3              bread
T3              ice cream
T4              butter
T4              jam

In the above table,
Is it possible to run apriori association rule in mysql statement?

For example, the support of buys(T, butter) --> buys(T, jam) = 50%

because there are 4 transactions and T1, T4 satisfy "support" rule.

Can i just use a sql statement to find out such result?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一笔一画续写前缘 2024-10-27 13:54:44

是的,你可以使用SQL来查找单个项目的支持。但如果你想找到包含多个项目的项目集,那就很困难了。

例如,如果您的交易包含多个项目,并且您希望找到“果酱”与“牛奶”和“面包”一起出现的支持,那么最好使用像 Apriori 这样的算法,或者像 FPGrowth 这样更快的算法。

Yes, you can use SQL to find the support of a single item. But if you want to find itemsets containing more than one item, it would be difficult.

For example, if you had transactions containing several items and you want to find the support of "jam" with "milk" and "bread" appearing together, then it is better to use an algorithm like Apriori, or a faster algorithm like FPGrowth.

冬天旳寂寞 2024-10-27 13:54:44

你给的样本数据我得出了66%?
有 3 笔交易涉及“黄油”,其中只有 2 笔交易涉及“果酱”。

我使用了以下测试表。

create table transactions(
   trans_no     varchar(5)  not null
  ,item         varchar(20) not null
  ,primary key(trans_no, item)
);

insert into transactions(trans_no, item)
values ('T1', 'butter')
      ,('T1', 'jam')
      ,('T2', 'butter')
      ,('T3', 'bread')
      ,('T3', 'ice cream')
      ,('T4', 'butter')
      ,('T4', 'jam');

以下是我尝试的答案。内部选择查找包含“黄油”的所有交易。对于每个这样的交易,它还设置一个标志(bought_jam),说明该交易是否也包含“jam”。 (having 子句排除包含“jam”但不包含“butter”的交易)。
在外部选择中,我基本上对所有行进行计数(计数对应于包括黄油的事务数),并对果酱标志进行求和,该标志对应于包括黄油和果酱的事务数。

select sum(bought_jam) as jams_bought
      ,count(*) as num_trans
      ,100 * sum(bought_jam) / count(*) as correlation_pct
  from (select trans_no
              ,max(case when item = 'jam' then 1 else 0 end) as bought_jam
          from transactions
         where item in('butter', 'jam')
         group 
            by trans_no
        having min(case when item = 'butter' then item end) = 'butter'
       ) butter_trans;

上面的查询给出以下结果:

+-------------+-----------+-----------------+
| jams_bought | num_trans | correlation_pct |
+-------------+-----------+-----------------+
|           2 |         3 |         66.6667 |
+-------------+-----------+-----------------+
1 row in set (0.00 sec)

让我知道这对您来说效果如何。

编辑:
以下查询将给出相同的结果,但更容易阅读。但是,如果事务表非常大,并且 item = x 选择性不强(返回大量行),则此查询几乎肯定会变慢。

select count(t2.trans_no) as jams_bought
      ,count(*) as num_trans
      ,count(t2.trans_no) / count(*) as correlation_pct
  from transactions t1
  left join transactions t2 on(t2.trans_no = t1.trans_no and t2.item = 'jam')
 where t1.item = 'butter';

I arrive at 66% for the sample data you gave?
There are 3 transactions for "butter", and out of those only 2 included "jam".

I used the following test table.

create table transactions(
   trans_no     varchar(5)  not null
  ,item         varchar(20) not null
  ,primary key(trans_no, item)
);

insert into transactions(trans_no, item)
values ('T1', 'butter')
      ,('T1', 'jam')
      ,('T2', 'butter')
      ,('T3', 'bread')
      ,('T3', 'ice cream')
      ,('T4', 'butter')
      ,('T4', 'jam');

The following is my attempt at an answer. The inner select find all transactions that included "butter". For each such transaction, it also sets a flag (bought_jam) saying whether that transaction also included "jam". (The having clause excludes transactions that includes "jam" but not "butter").
In the outer select, I basically count all rows (the count corresponds to the number of transactions including butter), and sums the jam flag, which corresponds to the number of transactions including both butter and jam.

select sum(bought_jam) as jams_bought
      ,count(*) as num_trans
      ,100 * sum(bought_jam) / count(*) as correlation_pct
  from (select trans_no
              ,max(case when item = 'jam' then 1 else 0 end) as bought_jam
          from transactions
         where item in('butter', 'jam')
         group 
            by trans_no
        having min(case when item = 'butter' then item end) = 'butter'
       ) butter_trans;

The query above gives the following result:

+-------------+-----------+-----------------+
| jams_bought | num_trans | correlation_pct |
+-------------+-----------+-----------------+
|           2 |         3 |         66.6667 |
+-------------+-----------+-----------------+
1 row in set (0.00 sec)

Let me know how this works out for you.

Edit:
The following query would give the same results, but is much easier to read. However, if the transactions table is very large, and the item = x isn't very selective (returns lots of rows), this query would almost certainly be slower.

select count(t2.trans_no) as jams_bought
      ,count(*) as num_trans
      ,count(t2.trans_no) / count(*) as correlation_pct
  from transactions t1
  left join transactions t2 on(t2.trans_no = t1.trans_no and t2.item = 'jam')
 where t1.item = 'butter';
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文