定义特定推文集合中的热门话题
我正在做一个 Java 应用程序,我必须从通过 Twitter 搜索获得的特定推文集合中确定哪些趋势主题。在网上搜索时,我发现该算法定义一个主题正在流行,当它在特定时间(即在特定时刻)被大量提及时。所以必须有一个衰减计算,使得主题经常变化。然而,我还有一个疑问:
twitter 如何确定推文中的哪些特定术语应该是 TT?例如,我观察到大多数文本都是主题标签或专有名词。这有什么意义吗?或者他们分析所有单词并确定频率?
我希望有人能帮助我!谢谢!
Im doing a Java application where I'll have to determine what are the Trending Topics from a specific collection of tweets, obtained trough the Twitter Search. While searching in the web, I found out that the algorithm defines that a topic is trending, when it has a big number of mentions in a specific time, that is, in the exact moment. So there must be a decay calculation so that the topics change often. However, I have another doubt:
How does twitter determines what specific terms in a tweet should be the TT? For example, I've observed that most TT's are hashtag or proper nouns. Does this make any sense? Or do they analyse all words and determine the frequency?
I hope someone can help me! Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为除了 Twitter 之外没有人知道,但主题标签似乎确实发挥了重要作用,但还有其他因素在起作用。我认为挖掘整个文本会花费比需要更多的时间,并且会导致太多误报。
这是来自 Mashable 的一篇感兴趣的文章:
http://www.sparkmediasolutions.com/pdfs/SMS_Twitter_Trending.pdf
-拉尔夫·温特斯
I don't think anyone knows except Twitter, however it seems hashtags do play a big part, but there are other factors in play. I think mining the whole text would take more time than needed, and would result in too many false positives.
Here is an interested article from Mashable:
http://www.sparkmediasolutions.com/pdfs/SMS_Twitter_Trending.pdf
-Ralph Winters
您可能对表情包跟踪感兴趣,我记得,用专有名词做有趣的事情,但基本上识别流中的主题,因为它们变得越来越不受欢迎:
并且在 Eddi,基于主题的交互式社交状态流浏览
You may be interested in meme tracking, which as I recall, does interesting things with proper nouns, but basically identifies topics in a stream as they become more and less popular:
And in Eddi, interactive topic-based browsing of social status streams