从消息中获取情报的算法选择

发布于 2024-10-05 20:22:57 字数 241 浏览 13 评论 0原文

我想做的是找到一种算法,通过将人们发送的消息与同行发送的消息进行比较,我可以实现该算法来为人们生成“智能”建议。

例如,人员 A 向人员 B 发送一条谈论 Obj1 的消息。如果 C 向 D 发送有关 Obj1 的消息,它会注意到他们正在谈论相同的事情,并可能建议 A 与 C 交谈。

我已经实现了收集统计数据以捕获人们共同提及的内容,但没有这样做知道使用哪种算法来分析这一点。

有什么建议吗? (我希望这有足够的意义)

What I'm trying to do is find an algorithm that can I can implement to generate 'intelligent' suggestions to people, by comparing messages they send to messages sent by their peers.

For example, Person A sends a message to Person B talking about Obj1. If Person C sends a message to Person D about Obj1, it will notice they are talking about the same things, and may suggest Person A talks to person C.

I have implemented collecting the statistics to capture the mentions people have in common but do not know which algorithm to use to analyse this.

Any suggestions?
(I hope this makes enough sense)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

极度宠爱 2024-10-12 20:22:58

看看聚类算法

k-means
k-最近邻快速入门

您有多少数据?越多越好。
解决这个问题有很多方法。例如,您可以认为所有用户在某种程度上都彼此相似,您想要做的是为每个用户找到最相似的用户。向量空间、余弦相似度将为您提供快速结果。
提供一些有关您想要实现的目标的更多信息。

take a look at clustering algorithms

and k-means or
k-nearest neighbours for a quick start

How much data you've got? The more the better.
There are lots of approaches to this problem. You may for example take that all users, to some degree, are similar to each other and what you want to do is to find for each user the most similar ones.Vector space, cosine similarity, will give you quick results.
Give some more information on what you want to achieve.

风透绣罗衣 2024-10-12 20:22:58

这正是 Twitter 正在努力解决的问题。如果你解决了这个问题,你最终可能会在那里找到一份工作;)

认真地回来,人们可以使用一些粗略的措施(即基于启发式的)来做这样的事情,但它有一个很大的错误百分比。正如德尔南在评论中所说。

NLP 是一个肯定的选择。请注意,使用 NLP 也有一些错误百分比,但它比您使用的任何启发式方法都要准确得多。如果您使用 python,我建议您使用这个工具包,我偶尔会使用它 - NLP

对于其他语言,我确信有一些软件包可以在这方面为您提供帮助。

更新1:如果你有办法让用户标记他们的消息(就像 stackoverflow 那样),那么你可以在不使用 NLP 的情况下解决这个问题。然后,您可以简单地取两条消息的标签的交集,看看是否有任何共同点和共同点。为常见项目推荐一些顶级项目。

但是您还必须处理其他问题 - 使标签成为强制性的,另外您需要确保用户实际上输入了正确的标签等...但是,这大大简化了您的问题。

更新2:由于问题已更新 - 因为您只有一些您感兴趣的特定关键字/短语。这简化了它。您需要获取每条消息,将其拆分为单词,然后词干< /a> 每个词。词干提取后,将此集合与您拥有的关键字集相交。你会得到一套(S1)。对第二条消息执行相同的操作,您将得到一组(S2)。与 S1、S2 相交。如果您发现某些东西很常见,宾果游戏!某些主题在 message1 和 message2 之间是共同的。其他什么也没有。

This is exactly the same problem Twitter is battling with. You might end up with a job there if you crack this ;)

On serious note coming back, one could use some crude measures (i.e. heuristic based) to do something like this, but it has a big error percentage. As delnan said in the comment.

NLP is a sure bet. Note that using NLP too has some error %, but it's far more accurate than any heuristic you would use. If you are using python I would suggest this toolkit, I use it now and then - NLP.

For other languages I am sure there are packages which will help you in this regard.

UPDATE1: If you have a way for the users to tag their messages (like stackoverflow does), you could approach this problem barring NLP. Then you could simply take the intersection of the tags of both the messages to see if there is any commonality & suggest some top items for the common items.

But there are other issues you'll have to deal with - make tags a mandatory, plus you need to be sure that the users are actually entering correct tags etc... But nevertheless this greatly simplifies your problem.

UPDATE2: As the Q has been updated - Since you have some specific keywords/phrases only which you are interested in. This kind of simplifies it. You would need to get each of your message, split it into words, then stem each word. After stemming, intersect this set with the set of keywords you have. You'll get a set(S1). Do the same with the second message, you'll get a set(S2). Intersect S1, S2. If you find something is common, bingo! Some theme is common between message1, message2. else nothing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文