推荐系统实验

发布于 2024-08-12 13:56:57 字数 1213 浏览 4 评论 0原文

我正在阅读《集体智能编程》一书，并尝试将我所学到的知识应用到 Northwind 数据库中。尽管我对所提出的算法的理解还没有信心，但我开始了解一般概念。

使用 Northwind 数据库，我尝试使用以下伪逻辑显示“购买此产品的客户也购买了 XYZ”的列表：

查找也购买了我的商品的其他客户
查找这些客户购买的所有其他商品
根据以下条件对商品进行排名购买计数
返回上一步中的前 N 件商品

我正在处理以下查询：

declare 
    @customerid nchar(5),
    @productid int;

set @customerid = 'ALFKI';
set @productid = 59;

-- find other products from customers who
-- also purchased my productid

select top 10
    od.productid, c.categoryname, p.productname, p.unitsonorder, count(od.productid)
from
    [order details] od
        inner join orders o on o.orderid = od.orderid
        inner join products p on p.productid = od.productid
        inner join categories c on c.categoryid = p.categoryid
where
    o.customerid <> @customerid and
    od.productid <> @productid and 
    p.discontinued = 0
group by
    od.productid, c.categoryname, p.productname, p.unitsonorder
order by 5 desc,4 desc

我认为下一步是分解查询，以便我可以根据最近的购买（而不是所有历史购买）进行过滤并限制客户匹配 N 个客户，而不是所有购买我产品的客户。任何人都可以提供任何指示吗？我的方向正确吗？我应该采取完全不同的方向吗？

在这一点上，我的目标是性能而不是准确性，因为我知道我还没有经验来应用算法来充分发挥其作用。我只是想应用这个概念。一旦我对自己的理解足够满意，我打算使用更真实的客户数据针对更大的数据库测试此查询。

原文

I'm reading the book Programming Collective Intelligence and trying to apply what I've learned against the Northwind database. Though I'm not yet confident in my understanding of the algorithms presented, I'm starting to get an idea of the general concepts.

Using the Northwind database, I'm attempting to show a list of "customers who bought this also bought XYZ" using the following pseudo-logic:

Find other customers who also purchased my item
Find all other items purchased by those customers
Rank items based on purchase count
Return top N items from previous step

I'm working with the query below:

declare 
    @customerid nchar(5),
    @productid int;

set @customerid = 'ALFKI';
set @productid = 59;

-- find other products from customers who
-- also purchased my productid

select top 10
    od.productid, c.categoryname, p.productname, p.unitsonorder, count(od.productid)
from
    [order details] od
        inner join orders o on o.orderid = od.orderid
        inner join products p on p.productid = od.productid
        inner join categories c on c.categoryid = p.categoryid
where
    o.customerid <> @customerid and
    od.productid <> @productid and 
    p.discontinued = 0
group by
    od.productid, c.categoryname, p.productname, p.unitsonorder
order by 5 desc,4 desc

I think my next step would be to break the query up so that I can filter based on recent purchases (as opposed to all historical purchases) and limit customer matching to N customers, rather than ALL customers who purchased my product. Can anyone provide any pointers? Am I headed in the right direction? Should I take a different direction entirely?

At this point, my goal is performance over accuracy, as I know I do not yet have the experience to apply the algorithms to their fullest benefit. I'm merely trying to apply the concept. Once I'm satisfied I understand it well enough, I intend to test this query against a larger database with more realistic customer data.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蒲公英的约定 2024-08-19 13:56:57

这通常被称为“市场篮子”分析或“亲和力分组”，它并不像听起来那么简单，主要是由于大量的组合。首先要考虑的是能够导致阳性检测的最少病例数是多少。例如，假设我们有 100 万个购物篮，购买产品 A 的 10 个人也在同一个购物篮中购买了产品 B。 10个案例足以说明“买了A的人也买了B”吗？因此，使用“最近”购买的商品有点棘手。这个想法是创建一个产品对表，但如果我们有 N 个产品，组合数量为 N*(N-1)，因此对于 1000 个产品的商店，我们将有 999000 个组合，因此修剪的算法是不简单。

另一件需要考虑的事情是物品的顺序和每件物品的价值。例如，购买自行车的顾客可能经常购买（添加）“LED 灯”。但是，如果将 LED 灯放入篮子中，系统是否应该在列表中提供自行车？

考虑到您正在使用 SQL Server，我建议您使用分析服务数据挖掘为此目的使用“MS关联算法”。通过使用相同的数据，您可以将您的结果与“商业解决方案”进行比较。

回复收藏 0 原文

~没有更多了~