智能网络功能、算法(您可能关注的人,与您相似的人......)

发布于 2024-10-07 08:14:26 字数 1431 浏览 1 评论 0原文

我有 3 个关于智能网络算法(web 2.0)的主要问题,

这是我正在阅读的书http://www.amazon.com/Algorithms-Intelligent-Web-Haralambos-Marmanis/dp/1933988665 我想更深入地学习算法

1。 您可能关注的人 (Twitter)

如何确定最接近我的请求的结果?数据挖掘?哪些算法?

2.你如何连接功能(Linkedin)

简单的算法就是这样工作的。它绘制两个节点之间的路径,例如 Me 和另一个人 C 之间的路径。 我-> A、B->连接-> C.它不是任何强力算法或任何其他类似的图形算法:)

3. 与您类似(Twitter、Facebook) 此算法类似于1。它是否仅适用于最大(计数)个共同好友(Facebook)或最大(计数)个Twitter关注者?或者他们实现的任何其他算法?我认为第二部分是正确的,因为在每个刷新页面中运行循环

 dict{count, person}
 for person in contacts:
        dict.add(count(common(person)))
 return dict(max)

都是愚蠢的行为。

4. 您的意思是 (Google) 我知道他们可能会用语音算法来实现它 http://en.wikipedia .org/wiki/Phonetic_algorithm 只是soundex http://en .wikipedia.org/wiki/Soundex 这里是 Google 工程副总裁兼首席信息官 Douglas Merrill 的讲话 http://www.youtube.com/watch?v=syKY8CrHkck#t=22m03s

前 3 个问题怎么样?欢迎任何想法!

谢谢

I have 3 main questions about the algorithms in intelligent web (web 2.0)

Here the book I'm reading http://www.amazon.com/Algorithms-Intelligent-Web-Haralambos-Marmanis/dp/1933988665 and I want to learn the algorithms in deeper

1. People You may follow (Twitter)

How can one determine the nearest result to my requests ? Data mining? which algorithms?

2. How you’re connected feature (Linkedin)

Simply algorithm works like that. It draws the path between two nodes let say between Me and the other person is C. Me -> A, B -> A connections -> C . It is not any brute force algorithms or any other like graph algorithms :)

3. Similar to you (Twitter, Facebook)
This algorithms is similar to 1. Does it simply work the max(count) friend in common (facebook) or the max(count) follower in Twitter? or any other algorithms they implement? I think the second part is true because running the loop

 dict{count, person}
 for person in contacts:
        dict.add(count(common(person)))
 return dict(max)

is a silly act in every refreshing page.

4. Did you mean (Google)
I know that they may implement it with phonetic algorithm http://en.wikipedia.org/wiki/Phonetic_algorithm simply soundex http://en.wikipedia.org/wiki/Soundex and here is the Google VP of Engineering and CIO Douglas Merrill speak http://www.youtube.com/watch?v=syKY8CrHkck#t=22m03s

What about first 3 questions? Any ideas are welcome !

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

余生再见 2024-10-14 08:14:26

您可能关注的人

您可以使用基于因素的计算:

factorA = getFactorA(); // say double(0.3)
factorB = getFactorB(); // say double(0.6)
factorC = getFactorC(); // say double(0.8)

result = (factorA+factorB+factorC) / 3 // double(0.5666666666666667)
// if result is more than 0.5, you show this person

因此,就 Twitter 而言,“您可能关注的人”可以基于以下因素(用户 A 是查看此“您可能关注的人”功能的用户) ,可能存在或多或少的因素):

  • 用户 A 和用户 B 的推文中常见关键字之间的相关性
  • 两个用户的个人资料描述之间的相关性
  • 用户 A 和 B 的位置之间的相关性
  • 用户 A 关注的人是否关注用户 B?

那么他们从哪里比较“您可能关注的人”呢?该列表可能来自具有大量关注者的人的组合(他们可能是名人、阿尔法极客、著名产品/服务等),并且[用户 A 正在关注的人] 正在关注。

基本上,这里需要进行一定程度的数据挖掘,阅读推文和简介,进行计算。当一天的服务器负载最少时,可以在每日或每周的 cron 作业中完成此操作(或者可能在单独的服务器上 24/7 完成)。

你是如何连接的

这可能是一个聪明的工作,让你感觉已经完成了负载的蛮力来确定路径。然而经过一些表面研究,我发现这很简单:

假设你是用户A;用户B是你的连接;用户 C 是用户 B 的连接。

为了访问用户 C,您需要先访问用户 B 的个人资料。通过访问用户 B 的个人资料,网站已经保存了表明用户 A 位于用户 B 的个人资料中的信息。因此,当您从用户B访问用户C时,网站立即告诉您“用户A - >”用户B->用户C',忽略所有其他可能的路径。

这是用户 C 的最大级别,用户 A 无法继续查看他的连接,直到用户 C 成为用户 A 的连接。

来源:观察 LinkedIN

与您类似

这与#1(您可能关注的人)完全相同,只是算法读取不同的人员列表。算法读入的人员列表是您关注的人员。

您的意思是

,好吧,您说得对,只是 Google 可能不仅仅使用 soundex。以谷歌为例,有语言翻译、单词替换和许多其他算法。我对此无法发表太多评论,因为它可能会变得非常复杂,而且我不是处理语言的专家。

如果我们对谷歌的基础设施进行更多研究,我们可以发现谷歌拥有专门用于拼写和翻译服务的服务器。您可以访问 http://en.wikipedia.org/wiki/Google_platform。

结论

高度强化的算法的关键是缓存。缓存结果后,您不必在每个页面都加载它。 Google 做到了,Stack Overflow 做到了(在大多数带有问题列表的页面上),Twitter 也做到了,这并不奇怪!

基本上,算法是由开发人员定义的。您可以使用其他人的算法,但最终您也可以创建自己的算法。

People who you may follow

You can use the factors based calculations:

factorA = getFactorA(); // say double(0.3)
factorB = getFactorB(); // say double(0.6)
factorC = getFactorC(); // say double(0.8)

result = (factorA+factorB+factorC) / 3 // double(0.5666666666666667)
// if result is more than 0.5, you show this person

So say in the case of Twitter, "People who you may follow" can based on the following factors (User A is the user viewing this "People who you may follow" feature, there may be more or less factors):

  • Relativity between frequent keywords found in User A's and User B's tweets
  • Relativity between the profile description of both users
  • Relativity between the location of User A and B
  • Are people User A is following follows User B?

So where do they compare "People who you may follow" from? The list probably came from a combination of people with high amount of followers (they are probably celebrities, alpha geeks, famous products/services, etc.) and [people whom User A is following] is following.

Basically there's a certain level of data mining to be done here, reading the tweets and bios, calculations. This can be done on a daily or weekly cron job when the server load is least for the day (or maybe done 24/7 on a separate server).

How are you connected

This is probably a smart work here to make you feel that loads of brute force has been done to determine the path. However after some surface research, I find that this is simple:

Say you are User A; User B is your connection; and User C is a connection of User B.

In order for you to visit User C, you need to visit User B's profile first. By visiting User B's profile, the website already save the info indiciating that User A is at User B's profile. So when you visit User C from User B, the website immediately tells you that 'User A -> User B -> User C', ignoring all other possible paths.

This is the max level as at User C, User Acannot go on to look at his connections until User C is User A's connection.

Source: observing LinkedIN

Similar to you

It's the exact same thing as #1 (People you may follow), except that the algorithm reads in a different list of people. The list of people that the algorithm reads in is the people whom you follow.

Did you mean

Well you got it right there, except that Google probably used more than just soundex. There's language translation, word replacement, and many other algorithms used for the case of Google. I can't comment much on this because it will probably get very complex and I am not an expert to handle languages.

If we research a little more into Google's infrastructure, we can find that Google has servers dedicated to Spelling and Translation services. You can get more information on Google platform at http://en.wikipedia.org/wiki/Google_platform.

Conclusion

The key to highly intensified algorithms is caching. Once you cache the result, you don't have to load it every page. Google does it, Stack Overflow does it (on most of the pages with list of questions) and Twitter not surprisingly too!

Basically, algorithms are defined by developers. You may use others' algorithms, but ultimately, you can also create your own.

夏了南城 2024-10-14 08:14:26

您可能关注的人

可能是多种推荐算法之一,也许是协作过滤

你们如何连接

这只是社交图谱上的最短路径算法。假设连接没有权重,它将简单地使用广度优先

与您相似

只是使用与您可能关注的人相同的算法重新排列数据集。

查看集体智能编程一书,以获得良好的帮助介绍用于您可能关注的人与您相似的算法类型,它也提供了很棒的Python代码。

People you may follow

Could be one of many types of recommendation algorithms, maybe collaborative filtering?

How you are connected

This is just a shortest path algorithm on the social graph. Assuming there is no weight to the connections, it will simply use breadth-first.

Similar to you

Simply a re-arrangement of the data set using the same algorithm as People you may follow.

Check out the book Programming Collective Intelligence for a good introduction to the type of algorithms that are used for People you may follow and Similar to you, it has great python code available too.

甜味拾荒者 2024-10-14 08:14:26
  1. 您可能关注的人
    来自 Twitter 博客 - “建议基于多个因素,包括您关注的人和他们关注的人”
  2. 您的联系方式功能
    我想你已经回答了这个问题。
  3. 与你相似
    如上所述,正如您所说,虽然结果可能被缓存 - 所以每个会话只执行一次,甚至可能频率更低......

希望有帮助,
克里斯

  1. People You may follow
    From Twitter blog - "suggestions are based on several factors, including people you follow and the people they follow" http://blog.twitter.com/2010/07/discovering-who-to-follow.html
    So if you follow A and B and they both follow C, then Twitter will suggest C to you...
  2. How you’re connected feature
    I think you have answered this one.
  3. Similar to you
    As above and as you say, although the results are probably cached - so its only done once per session or maybe even less frequently...

Hope that helps,
Chris

独自←快乐 2024-10-14 08:14:26

我不使用推特;但考虑到这一点:

1)。从表面上看,这并不难:对于我关注的每个人,查看他们关注的人。然后,对于他们关注的每个人,查看他们关注的人等。当然,你走得越深,需要的数字处理就越多。

如果你也能有效地提取相反的内容,你可以更进一步:对于我关注的人,谁也关注他们?

对于这两种方式,没有说的是一种对推特用户进行加权的方法,看看他们是否是我真正想要关注的人:自由派追随者也可能关注保守派推特用户,但这并不意味着我想要关注保守(见#3)。

2)。不确定,思考一下......

3)。假设简介和推文是唯一要继续的内容,那么困难的部分是:

  • 决定应该存在哪些属性(政治背景、主题类型等)
  • 清理每 140 个字符以进行数据挖掘。

一旦你拥有了正确的属性集,那么两种不同的算法就会浮现在脑海中:

  • K 表示聚类,以决定我倾向于区分哪些属性。
  • N-最近邻,根据我倾向于重视的属性,找到与您最相似的 N 个高音扬声器。
  • 编辑:实际上,决策树可能是完成所有这一切的更好方法......

这都是推测性的,但如果有人付费来做这件事,听起来很有趣。

I don't use twitter; but with that in mind:

1). On the surface, this isn't that difficult: For each person I follow, see who they follow. Then for each of the people they follow, see who they follow, etc. The deeper you go, of course, the more number crunching it takes.

You can take this a bit further, if you can also efficiently extract the reverse: For those I follow, who also follows them?

For both ways, what's unsaid is a way to weight the tweeters to see if they're someone I'd really want to follow: A liberal follower may also follow a conservative tweeter, but that doesn't mean I'd want follow the conservative (see #3).

2). Not sure, thinking about it...

3). Assuming the bio and tweets are the only thing to go on, the hard parts are:

  • Deciding what attributes should exist (political affiliation, topic types, etc.)
  • Cleaning each 140 characters to data-mine.

Once you have the right set of attributes, then two different algorithms come to mind:

  • K means clustering, to decide which attributes I tend to discriminate on.
  • N-Nearest neighbor, to find the N most similar tweeters to you given the attributes I tend to give weight to.
  • EDIT: Actually, a decision tree is probably a FAR better way to do all of this...

This is all speculative, but it sounds fun if one were getting paid to do this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文