我希望在 twitter 上搜索一个单词(假设是#google),然后能够生成 twitts 中使用的单词的标签云,但是根据日期(例如,有一个小时的移动窗口,移动时间为每次 10 分钟,并向我展示不同的单词如何在一天中被更频繁地使用)。
我将不胜感激任何有关如何执行此操作的帮助:信息资源、编程代码(R 是我唯一喜欢使用的语言)和可视化想法。问题:
-
如何获取信息?
在R中,我发现twitteR包有searchTwitter命令。但我不知道我能从中得到多大的“n”。此外,它不会返回该推文的起源日期。
我看到这里我最多可以得到 1500 个 twitts,但这需要我手动进行解析(这导致我进入步骤 2)。另外,为了我的目的,我需要数以万计的推特。甚至有可能让他们回顾起来吗? (例如,每次通过 API URL 询问较旧的帖子?)如果没有,则存在一个更普遍的问题:如何在家用计算机上创建 twitts 的个人存储? (这个问题最好留给另一个 SO 线程 - 尽管这里的人的任何见解对我来说读起来都很有趣)
-
如何解析信息(在R中)?我知道 R 的函数可以从 rcurl 和 twitteR 包中得到帮助。但我不知道是哪个,也不知道如何使用它们。任何建议都会有所帮助。
-
如何分析?如何删除所有“无趣”的词?我发现 R 中的“tm”包有 此示例:
路透社<- tm_map(路透社,removeWords,stopwords(“english”))
这能解决问题吗?我应该做点别的/更多的事情吗?
另外,我想我想在根据时间切割数据集后执行此操作(这将需要一些类似 posix 的函数(我不确定这里需要哪些函数,或者如何使用它)。< /p>
-
最后,如何创建我找到的单词的标签云? .r-bloggers.com/creating-tag-cloud-using-r-and-flash-javascript-swfobject/" rel="nofollow noreferrer">这里有一个解决方案,还有其他建议/推荐吗?< /p>
我相信我在这里问了一个很大的问题,但我试图将其分解为尽可能多的简单问题。欢迎任何帮助!
最好的,
塔尔
I wish to search twitter for a word (let's say #google), and then be able to generate a tag cloud of the words used in twitts, but according to dates (for example, having a moving window of an hour, that moves by 10 minutes each time, and shows me how different words gotten more often used throughout the day).
I would appreciate any help on how to go about doing this regarding: resources for the information, code for the programming (R is the only language I am apt in using) and ideas on visualization. Questions:
-
How do I get the information?
In R, I found that the twitteR package has the searchTwitter command. But I don't know how big an "n" I can get from it. Also, It doesn't return the dates in which the twitt originated from.
I see here that I could get until 1500 twitts, but this requires me to do the parsing manually (which leads me to step 2). Also, for my purposes, I would need tens of thousands of twitts. Is it even possible to get them in retrospect?? (for example, asking older posts each time through the API URL ?) If not, there is the more general question of how to create a personal storage of twitts on your home computer? (a question which might be better left to another SO thread - although any insights from people here would be very interesting for me to read)
-
How to parse the information (in R)? I know that R has functions that could help from the rcurl and twitteR packages. But I don't know which, or how to use them. Any suggestions would be of help.
-
How to analyse? how to remove all the "not interesting" words? I found that the "tm" package in R has this example:
reuters <- tm_map(reuters, removeWords, stopwords("english"))
Would this do the trick? I should I do something else/more ?
Also, I imagine I would like to do that after cutting my dataset according to time (which will require some posix-like functions (which I am not exactly sure which would be needed here, or how to use it).
-
And lastly, there is the question of visualization. How do I create a tag cloud of the words? I found a solution for this here, any other suggestion/recommendations?
I believe I am asking a huge question here but I tried to break it to as many straightforward questions as possible. Any help will be welcomed!
Best,
Tal
发布评论
评论(4)
www.wordle.net
使用 openNLP 包,您可以对推文进行 pos 标记(pos=词性),然后仅提取名词、动词或形容词以在词云中可视化。
www.wordle.net
Using openNLP package you could pos-tag the tweets(pos=Part of speech) and then extract just the nouns, verbs or adjectives for visualization in a wordcloud.
至于绘图部分:我在这里做了一个词云: http://trends.techcrunch.com/2009/09/25/describe-yourself-in-3-or-4-words/ 使用片段包,我的代码就在那里。我手动拉出了某些单词。检查一下,如果您有更具体的问题,请告诉我。
As for the plotting piece: I did a word cloud here: http://trends.techcrunch.com/2009/09/25/describe-yourself-in-3-or-4-words/ using the snippets package, my code is in there. I manually pulled out certain words. Check it out and let me know if you have more specific questions.
我注意到这是一个老问题,并且可以通过网络搜索找到多种解决方案,但这里有一个答案(来自 http://blog.ouseful.info/2012/02/15/generate-twitter-wordclouds-in-r -由-open-learning-blogpost/提示):
I note that this is an old question, and there are several solutions available via web search, but here's one answer (via http://blog.ouseful.info/2012/02/15/generating-twitter-wordclouds-in-r-prompted-by-an-open-learning-blogpost/):
我想回答你关于制作大词云的问题。
我所做的是
结果:
该 Square Cloud 包含约 9000 条推文。
资料来源:人们通过使用 R CloudStat 进行 Twitter 分析
希望有帮助!
I would like to answer your question in making big word cloud.
What I did is
The result:
This Square Cloud consists of about 9000 tweets.
Source: People voice about Lynas Malaysia through Twitter Analysis with R CloudStat
Hope it help!