如何实现推荐系统?

发布于 2024-11-14 16:38:08 字数 382 浏览 3 评论 0原文

我有《集体智慧》一书,但我不确定它如何应用于实际。

假设我有一个带有 mySQL 数据库的 PHP 网站。用户可以在数据库中插入带有标题和内容的文章。为了简单起见,我们只比较标题。

  • 如何制作咖啡
  • 关于咖啡的 15 件事。
  • 大问题。
  • 如何削铅笔?
  • 家伙被击中睾丸

我们打开“如何煮咖啡?”文章,由于与第二、第四标题文字相似,因此将显示在相关文章部分。

我如何使用 PHP 和 mySQL 来实现这一点?如果我必须使用Python也没关系。提前致谢。

I've Collective Intelligence book, but I'm not sure how it can be apply in practical.

Let say I have a PHP website with mySQL database. User can insert articles with title and content in the database. For the sake of simplicity, we just compare the title.

  • How to Make Coffee?
  • 15 Things About Coffee.
  • The Big Question.
  • How to Sharpen A Pencil?
  • Guy Getting Hit in Balls

We open 'How to Make Coffee?' article and because there are similarity in words with the second and fourth title, they will be displayed in Related Article section.

How can I implement this using PHP and mySQL? It's ok if I have to use Python. Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

风筝在阴天搁浅。 2024-11-21 16:38:08

在每个产品旁边存储一组关键字,这些关键字本质上应该是标题中除了一组停用词。显示标题时,您会发现任何其他具有共同关键字的产品(具有一个或多个共同关键字的产品优先)。

您可以通过根据每个关键字的稀缺性为其分配分数来进一步增强这一点(稀缺单词越多,分数越高,例如,“PHP”的匹配将比“编程”的匹配更相关) '),或者通过跟踪用户在一组产品之间手动导航的次数。

不管怎样,你最好从简单开始,然后不断增强它。根据数据库的大小,更先进的技术可能不会那么有效。

Store a set of keywords alongside each product, which should essentially be everything in the title besides a set of stop words. When a title is displayed, you find any other products which share keywords in common (with those with one or more in common given priority).

You could further enhance this by assigning a score to each keyword based on its scarcity (with more scarce words being given a higher score, as a match on 'PHP', for instance, is going to be more relevant than a match on 'programming'), or by tracking the number of times a user navigates manually between a set of products.

Regardless you'd best start off by making it simple, and then enhance it as you go on. Depending on the size of your database more advanced techniques may not be all that fruitful.

柳若烟 2024-11-21 16:38:08

您最好使用一组标签,在插入标题时解析这些标签并将其存储在数据库中,然后基于该标签进行查询。

如果您必须解析标题,那么您基本上会执行 LIKE 查询:

SELECT * FROM ENTRIES WHERE TITLE LIKE '%<keyword>%';

但对于更详细的答案:

// You need some test to see if the word is valid. 
// "is" should not be considered a valid match.
// This is a simple one based on length, a 
// "blacklist" would be better, but that's up to you.
function isValidEntry( $word )
{
    return strlen( $word ) >= 4;
}

//to hold all relevant search strings:
$terms = array();
$postTitleWords = explode( ' ' , strtolower( 'How to Make Coffee' ) );

for( $postTitleWords as $index => $word )
{
    if( isValidEntry( $word ) ) $terms[] = $word;
    else
    {
        $bef = @$postTitleWords[ $index - 1 ];
        if( $bef && !isValidEntry( $bef ) ) $terms[] = "$bef $word";
        $aft = @$postTitleWords[ $index + 1 ];
        if( $aft && !isValidEntry( $aft ) ) $terms[] = "$word $aft";
    }
}
$terms = array_unique( $terms );
if( !count( $terms ) ) 
{
    //This is a completely unique title!
}
$search = 'SELECT * FROM ENTRIES WHERE lower( TITLE ) LIKE \'%' . implode( '%\' OR lower( TITLE ) LIKE \'%' $terms ) . '\'%';
// either pump that through your mysql_search or PDO.

You're best off using a set of tags which are parsed and stored in the db when the title is inserted, and then querying based on that.

If you have to parse the title though, you'd basically be doing a LIKE query:

SELECT * FROM ENTRIES WHERE TITLE LIKE '%<keyword>%';

For a more verbose answer though:

// You need some test to see if the word is valid. 
// "is" should not be considered a valid match.
// This is a simple one based on length, a 
// "blacklist" would be better, but that's up to you.
function isValidEntry( $word )
{
    return strlen( $word ) >= 4;
}

//to hold all relevant search strings:
$terms = array();
$postTitleWords = explode( ' ' , strtolower( 'How to Make Coffee' ) );

for( $postTitleWords as $index => $word )
{
    if( isValidEntry( $word ) ) $terms[] = $word;
    else
    {
        $bef = @$postTitleWords[ $index - 1 ];
        if( $bef && !isValidEntry( $bef ) ) $terms[] = "$bef $word";
        $aft = @$postTitleWords[ $index + 1 ];
        if( $aft && !isValidEntry( $aft ) ) $terms[] = "$word $aft";
    }
}
$terms = array_unique( $terms );
if( !count( $terms ) ) 
{
    //This is a completely unique title!
}
$search = 'SELECT * FROM ENTRIES WHERE lower( TITLE ) LIKE \'%' . implode( '%\' OR lower( TITLE ) LIKE \'%' $terms ) . '\'%';
// either pump that through your mysql_search or PDO.
七色彩虹 2024-11-21 16:38:08

这可以通过在 SQL 查询中使用通配符来简单地实现。如果您有较大的文本,并且通配符似乎无法捕获文本的中间部分,请检查其中一个的子字符串是否与另一个匹配。我希望这有帮助。
顺便说一句,您的问题标题询问如何实现推荐系统,问题描述仅询问如何匹配数据库记录中的字段。推荐系统是一个广泛的话题,并且伴随着许多有趣的算法(例如,协同过滤、基于内容的方法、矩阵分解、神经网络等)。如果您的项目达到了这种规模,请随意探索这些高级主题。

This can be simply achieved by using wildcards in SQL queries. If you have larger texts and the wildcard seems to be unable to capture the middle part of text then check if the substring of one matches the other. I hope this helps.
BTW, your question title asks about implementing recommendation system and the question description just asks about matching a field among database records. Recommendation system is a broad topic and comes with many interesting algorithms (e.g, Collaborative filtering, content-based method, matrix factorization, neural networks, etc.). Please feel free to explore these advanced topics if your project is to that scale.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文