智能网络功能、算法(您可能关注的人,与您相似的人......)
我有 3 个关于智能网络算法(web 2.0)的主要问题,
这是我正在阅读的书http://www.amazon.com/Algorithms-Intelligent-Web-Haralambos-Marmanis/dp/1933988665 我想更深入地学习算法
1。 您可能关注的人 (Twitter)
如何确定最接近我的请求的结果?数据挖掘?哪些算法?
2.你如何连接功能(Linkedin)
简单的算法就是这样工作的。它绘制两个节点之间的路径,例如 Me 和另一个人 C 之间的路径。 我-> A、B->连接-> C.它不是任何强力算法或任何其他类似的图形算法:)
3. 与您类似(Twitter、Facebook) 此算法类似于1。它是否仅适用于最大(计数)个共同好友(Facebook)或最大(计数)个Twitter关注者?或者他们实现的任何其他算法?我认为第二部分是正确的,因为在每个刷新页面中运行循环
dict{count, person}
for person in contacts:
dict.add(count(common(person)))
return dict(max)
都是愚蠢的行为。
4. 您的意思是 (Google) 我知道他们可能会用语音算法来实现它 http://en.wikipedia .org/wiki/Phonetic_algorithm 只是soundex http://en .wikipedia.org/wiki/Soundex 这里是 Google 工程副总裁兼首席信息官 Douglas Merrill 的讲话 http://www.youtube.com/watch?v=syKY8CrHkck#t=22m03s
前 3 个问题怎么样?欢迎任何想法!
谢谢
I have 3 main questions about the algorithms in intelligent web (web 2.0)
Here the book I'm reading http://www.amazon.com/Algorithms-Intelligent-Web-Haralambos-Marmanis/dp/1933988665 and I want to learn the algorithms in deeper
1. People You may follow (Twitter)
How can one determine the nearest result to my requests ? Data mining? which algorithms?
2. How you’re connected feature (Linkedin)
Simply algorithm works like that. It draws the path between two nodes let say between Me and the other person is C. Me -> A, B -> A connections -> C . It is not any brute force algorithms or any other like graph algorithms :)
3. Similar to you (Twitter, Facebook)
This algorithms is similar to 1. Does it simply work the max(count) friend in common (facebook) or the max(count) follower in Twitter? or any other algorithms they implement? I think the second part is true because running the loop
dict{count, person}
for person in contacts:
dict.add(count(common(person)))
return dict(max)
is a silly act in every refreshing page.
4. Did you mean (Google)
I know that they may implement it with phonetic algorithm http://en.wikipedia.org/wiki/Phonetic_algorithm simply soundex http://en.wikipedia.org/wiki/Soundex and here is the Google VP of Engineering and CIO Douglas Merrill speak http://www.youtube.com/watch?v=syKY8CrHkck#t=22m03s
What about first 3 questions? Any ideas are welcome !
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可能关注的人
您可以使用基于因素的计算:
因此,就 Twitter 而言,“您可能关注的人”可以基于以下因素(用户 A 是查看此“您可能关注的人”功能的用户) ,可能存在或多或少的因素):
那么他们从哪里比较“您可能关注的人”呢?该列表可能来自具有大量关注者的人的组合(他们可能是名人、阿尔法极客、著名产品/服务等),并且[用户 A 正在关注的人] 正在关注。
基本上,这里需要进行一定程度的数据挖掘,阅读推文和简介,进行计算。当一天的服务器负载最少时,可以在每日或每周的 cron 作业中完成此操作(或者可能在单独的服务器上 24/7 完成)。
你是如何连接的
这可能是一个聪明的工作,让你感觉已经完成了负载的蛮力来确定路径。然而经过一些表面研究,我发现这很简单:
假设你是用户A;用户B是你的连接;用户 C 是用户 B 的连接。
为了访问用户 C,您需要先访问用户 B 的个人资料。通过访问用户 B 的个人资料,网站已经保存了表明用户 A 位于用户 B 的个人资料中的信息。因此,当您从用户B访问用户C时,网站立即告诉您“用户A - >”用户B->用户C',忽略所有其他可能的路径。
这是用户 C 的最大级别,用户 A 无法继续查看他的连接,直到用户 C 成为用户 A 的连接。
来源:观察 LinkedIN
与您类似
这与#1(您可能关注的人)完全相同,只是算法读取不同的人员列表。算法读入的人员列表是您关注的人员。
您的意思是
,好吧,您说得对,只是 Google 可能不仅仅使用 soundex。以谷歌为例,有语言翻译、单词替换和许多其他算法。我对此无法发表太多评论,因为它可能会变得非常复杂,而且我不是处理语言的专家。
如果我们对谷歌的基础设施进行更多研究,我们可以发现谷歌拥有专门用于拼写和翻译服务的服务器。您可以访问 http://en.wikipedia.org/wiki/Google_platform。
结论
高度强化的算法的关键是缓存。缓存结果后,您不必在每个页面都加载它。 Google 做到了,Stack Overflow 做到了(在大多数带有问题列表的页面上),Twitter 也做到了,这并不奇怪!
基本上,算法是由开发人员定义的。您可以使用其他人的算法,但最终您也可以创建自己的算法。
People who you may follow
You can use the factors based calculations:
So say in the case of Twitter, "People who you may follow" can based on the following factors (User A is the user viewing this "People who you may follow" feature, there may be more or less factors):
So where do they compare "People who you may follow" from? The list probably came from a combination of people with high amount of followers (they are probably celebrities, alpha geeks, famous products/services, etc.) and [people whom User A is following] is following.
Basically there's a certain level of data mining to be done here, reading the tweets and bios, calculations. This can be done on a daily or weekly cron job when the server load is least for the day (or maybe done 24/7 on a separate server).
How are you connected
This is probably a smart work here to make you feel that loads of brute force has been done to determine the path. However after some surface research, I find that this is simple:
Say you are User A; User B is your connection; and User C is a connection of User B.
In order for you to visit User C, you need to visit User B's profile first. By visiting User B's profile, the website already save the info indiciating that User A is at User B's profile. So when you visit User C from User B, the website immediately tells you that 'User A -> User B -> User C', ignoring all other possible paths.
This is the max level as at User C, User Acannot go on to look at his connections until User C is User A's connection.
Source: observing LinkedIN
Similar to you
It's the exact same thing as #1 (People you may follow), except that the algorithm reads in a different list of people. The list of people that the algorithm reads in is the people whom you follow.
Did you mean
Well you got it right there, except that Google probably used more than just soundex. There's language translation, word replacement, and many other algorithms used for the case of Google. I can't comment much on this because it will probably get very complex and I am not an expert to handle languages.
If we research a little more into Google's infrastructure, we can find that Google has servers dedicated to Spelling and Translation services. You can get more information on Google platform at http://en.wikipedia.org/wiki/Google_platform.
Conclusion
The key to highly intensified algorithms is caching. Once you cache the result, you don't have to load it every page. Google does it, Stack Overflow does it (on most of the pages with list of questions) and Twitter not surprisingly too!
Basically, algorithms are defined by developers. You may use others' algorithms, but ultimately, you can also create your own.
您可能关注的人
可能是多种推荐算法之一,也许是协作过滤?
你们如何连接
这只是社交图谱上的最短路径算法。假设连接没有权重,它将简单地使用广度优先。
与您相似
只是使用与您可能关注的人相同的算法重新排列数据集。
查看集体智能编程一书,以获得良好的帮助介绍用于您可能关注的人和与您相似的算法类型,它也提供了很棒的Python代码。
People you may follow
Could be one of many types of recommendation algorithms, maybe collaborative filtering?
How you are connected
This is just a shortest path algorithm on the social graph. Assuming there is no weight to the connections, it will simply use breadth-first.
Similar to you
Simply a re-arrangement of the data set using the same algorithm as People you may follow.
Check out the book Programming Collective Intelligence for a good introduction to the type of algorithms that are used for People you may follow and Similar to you, it has great python code available too.
来自 Twitter 博客 - “建议基于多个因素,包括您关注的人和他们关注的人”
我想你已经回答了这个问题。
如上所述,正如您所说,虽然结果可能被缓存 - 所以每个会话只执行一次,甚至可能频率更低......
希望有帮助,
克里斯
From Twitter blog - "suggestions are based on several factors, including people you follow and the people they follow" http://blog.twitter.com/2010/07/discovering-who-to-follow.html
So if you follow A and B and they both follow C, then Twitter will suggest C to you...
I think you have answered this one.
As above and as you say, although the results are probably cached - so its only done once per session or maybe even less frequently...
Hope that helps,
Chris
我不使用推特;但考虑到这一点:
1)。从表面上看,这并不难:对于我关注的每个人,查看他们关注的人。然后,对于他们关注的每个人,查看他们关注的人等。当然,你走得越深,需要的数字处理就越多。
如果你也能有效地提取相反的内容,你可以更进一步:对于我关注的人,谁也关注他们?
对于这两种方式,没有说的是一种对推特用户进行加权的方法,看看他们是否是我真正想要关注的人:自由派追随者也可能关注保守派推特用户,但这并不意味着我想要关注保守(见#3)。
2)。不确定,思考一下......
3)。假设简介和推文是唯一要继续的内容,那么困难的部分是:
一旦你拥有了正确的属性集,那么两种不同的算法就会浮现在脑海中:
这都是推测性的,但如果有人付费来做这件事,听起来很有趣。
I don't use twitter; but with that in mind:
1). On the surface, this isn't that difficult: For each person I follow, see who they follow. Then for each of the people they follow, see who they follow, etc. The deeper you go, of course, the more number crunching it takes.
You can take this a bit further, if you can also efficiently extract the reverse: For those I follow, who also follows them?
For both ways, what's unsaid is a way to weight the tweeters to see if they're someone I'd really want to follow: A liberal follower may also follow a conservative tweeter, but that doesn't mean I'd want follow the conservative (see #3).
2). Not sure, thinking about it...
3). Assuming the bio and tweets are the only thing to go on, the hard parts are:
Once you have the right set of attributes, then two different algorithms come to mind:
This is all speculative, but it sounds fun if one were getting paid to do this.