按日期绘制 Twitter 搜索结果的词云? (使用R)

发布于 2024-09-03 18:57:41 字数 1338 浏览 9 评论 0 原文

我希望在 twitter 上搜索一个单词(假设是#google),然后能够生成 twitts 中使用的单词的标签云,但是根据日期(例如,有一个小时的移动窗口,移动时间为每次 10 分钟,并向我展示不同的单词如何在一天中被更频繁地使用)。

我将不胜感激任何有关如何执行此操作的帮助:信息资源、编程代码(R 是我唯一喜欢使用的语言)和可视化想法。问题:

  1. 如何获取信息?

    在R中,我发现twitteR包有searchTwitter命令。但我不知道我能从中得到多大的“n”。此外,它不会返回该推文的起源日期。

    我看到这里我最多可以得到 1500 个 twitts,但这需要我手动进行解析(这导致我进入步骤 2)。另外,为了我的目的,我需要数以万计的推特。甚至有可能让他们回顾起来吗? (例如,每次通过 API URL 询问较旧的帖子?)如果没有,则存在一个更普遍的问题:如何在家用计算机上创建 twitts 的个人存储? (这个问题最好留给另一个 SO 线程 - 尽管这里的人的任何见解对我来说读起来都很有趣)

  2. 如何解析信息(在R中)?我知道 R 的函数可以从 rcurl 和 twitteR 包中得到帮助。但我不知道是哪个,也不知道如何使用它们。任何建议都会有所帮助。

  3. 如何分析?如何删除所有“无趣”的词?我发现 R 中的“tm”包有 此示例

    路透社<- tm_map(路透社,removeWords,stopwords(“english”))

    这能解决问题吗?我应该做点别的/更多的事情吗?

    另外,我想我想在根据时间切割数据集后执行此操作(这将需要一些类似 posix 的函数(我不确定这里需要哪些函数,或者如何使用它)。< /p>

  4. 最后,如何创建我找到的单词的标签云? .r-bloggers.com/creating-tag-cloud-using-r-and-flash-javascript-swfobject/" rel="nofollow noreferrer">这里有一个解决方案,还有其他建议/推荐吗?< /p>

我相信我在这里问了一个很大的问题,但我试图将其分解为尽可能多的简单问题。欢迎任何帮助!

最好的,

塔尔

I wish to search twitter for a word (let's say #google), and then be able to generate a tag cloud of the words used in twitts, but according to dates (for example, having a moving window of an hour, that moves by 10 minutes each time, and shows me how different words gotten more often used throughout the day).

I would appreciate any help on how to go about doing this regarding: resources for the information, code for the programming (R is the only language I am apt in using) and ideas on visualization. Questions:

  1. How do I get the information?

    In R, I found that the twitteR package has the searchTwitter command. But I don't know how big an "n" I can get from it. Also, It doesn't return the dates in which the twitt originated from.

    I see here that I could get until 1500 twitts, but this requires me to do the parsing manually (which leads me to step 2). Also, for my purposes, I would need tens of thousands of twitts. Is it even possible to get them in retrospect?? (for example, asking older posts each time through the API URL ?) If not, there is the more general question of how to create a personal storage of twitts on your home computer? (a question which might be better left to another SO thread - although any insights from people here would be very interesting for me to read)

  2. How to parse the information (in R)? I know that R has functions that could help from the rcurl and twitteR packages. But I don't know which, or how to use them. Any suggestions would be of help.

  3. How to analyse? how to remove all the "not interesting" words? I found that the "tm" package in R has this example:

    reuters <- tm_map(reuters, removeWords, stopwords("english"))

    Would this do the trick? I should I do something else/more ?

    Also, I imagine I would like to do that after cutting my dataset according to time (which will require some posix-like functions (which I am not exactly sure which would be needed here, or how to use it).

  4. And lastly, there is the question of visualization. How do I create a tag cloud of the words? I found a solution for this here, any other suggestion/recommendations?

I believe I am asking a huge question here but I tried to break it to as many straightforward questions as possible. Any help will be welcomed!

Best,

Tal

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

尹雨沫 2024-09-10 18:57:44
夜空下最亮的亮点 2024-09-10 18:57:44

至于绘图部分:我在这里做了一个词云: http://trends.techcrunch.com/2009/09/25/describe-yourself-in-3-or-4-words/ 使用片段包,我的代码就在那里。我手动拉出了某些单词。检查一下,如果您有更具体的问题,请告诉我。

As for the plotting piece: I did a word cloud here: http://trends.techcrunch.com/2009/09/25/describe-yourself-in-3-or-4-words/ using the snippets package, my code is in there. I manually pulled out certain words. Check it out and let me know if you have more specific questions.

毁我热情 2024-09-10 18:57:44

我注意到这是一个老问题,并且可以通过网络搜索找到多种解决方案,但这里有一个答案(来自 http://blog.ouseful.info/2012/02/15/generate-twitter-wordclouds-in-r -由-open-learning-blogpost/提示):

require(twitteR)
searchTerm='#dev8d'
#Grab the tweets
rdmTweets <- searchTwitter(searchTerm, n=500)
#Use a handy helper function to put the tweets into a dataframe
tw.df=twListToDF(rdmTweets)

##Note: there are some handy, basic Twitter related functions here:
##https://github.com/matteoredaelli/twitter-r-utils
#For example:
RemoveAtPeople <- function(tweet) {
  gsub("@\\w+", "", tweet)
}
#Then for example, remove @d names
tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople))

##Wordcloud - scripts available from various sources; I used:
#http://rdatamining.wordpress.com/2011/11/09/using-text-mining-to-find-out-what-rdatamining-tweets-are-about/
#Call with eg: tw.c=generateCorpus(tw.df$text)
generateCorpus= function(df,my.stopwords=c()){
  #Install the textmining library
  require(tm)
  #The following is cribbed and seems to do what it says on the can
  tw.corpus= Corpus(VectorSource(df))
  # remove punctuation
  tw.corpus = tm_map(tw.corpus, removePunctuation)
  #normalise case
  tw.corpus = tm_map(tw.corpus, tolower)
  # remove stopwords
  tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english'))
  tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords)

  tw.corpus
}

wordcloud.generate=function(corpus,min.freq=3){
  require(wordcloud)
  doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
  dm = as.matrix(doc.m)
  # calculate the frequency of words
  v = sort(rowSums(dm), decreasing=TRUE)
  d = data.frame(word=names(v), freq=v)
  #Generate the wordcloud
  wc=wordcloud(d$word, d$freq, min.freq=min.freq)
  wc
}

print(wordcloud.generate(generateCorpus(tweets,'dev8d'),7))

##Generate an image file of the wordcloud
png('test.png', width=600,height=600)
wordcloud.generate(generateCorpus(tweets,'dev8d'),7)
dev.off()

#We could make it even easier if we hide away the tweet grabbing code. eg:
tweets.grabber=function(searchTerm,num=500){
  require(twitteR)
  rdmTweets = searchTwitter(searchTerm, n=num)
  tw.df=twListToDF(rdmTweets)
  as.vector(sapply(tw.df$text, RemoveAtPeople))
}
#Then we could do something like:
tweets=tweets.grabber('ukgc12')
wordcloud.generate(generateCorpus(tweets),3)

I note that this is an old question, and there are several solutions available via web search, but here's one answer (via http://blog.ouseful.info/2012/02/15/generating-twitter-wordclouds-in-r-prompted-by-an-open-learning-blogpost/):

require(twitteR)
searchTerm='#dev8d'
#Grab the tweets
rdmTweets <- searchTwitter(searchTerm, n=500)
#Use a handy helper function to put the tweets into a dataframe
tw.df=twListToDF(rdmTweets)

##Note: there are some handy, basic Twitter related functions here:
##https://github.com/matteoredaelli/twitter-r-utils
#For example:
RemoveAtPeople <- function(tweet) {
  gsub("@\\w+", "", tweet)
}
#Then for example, remove @d names
tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople))

##Wordcloud - scripts available from various sources; I used:
#http://rdatamining.wordpress.com/2011/11/09/using-text-mining-to-find-out-what-rdatamining-tweets-are-about/
#Call with eg: tw.c=generateCorpus(tw.df$text)
generateCorpus= function(df,my.stopwords=c()){
  #Install the textmining library
  require(tm)
  #The following is cribbed and seems to do what it says on the can
  tw.corpus= Corpus(VectorSource(df))
  # remove punctuation
  tw.corpus = tm_map(tw.corpus, removePunctuation)
  #normalise case
  tw.corpus = tm_map(tw.corpus, tolower)
  # remove stopwords
  tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english'))
  tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords)

  tw.corpus
}

wordcloud.generate=function(corpus,min.freq=3){
  require(wordcloud)
  doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
  dm = as.matrix(doc.m)
  # calculate the frequency of words
  v = sort(rowSums(dm), decreasing=TRUE)
  d = data.frame(word=names(v), freq=v)
  #Generate the wordcloud
  wc=wordcloud(d$word, d$freq, min.freq=min.freq)
  wc
}

print(wordcloud.generate(generateCorpus(tweets,'dev8d'),7))

##Generate an image file of the wordcloud
png('test.png', width=600,height=600)
wordcloud.generate(generateCorpus(tweets,'dev8d'),7)
dev.off()

#We could make it even easier if we hide away the tweet grabbing code. eg:
tweets.grabber=function(searchTerm,num=500){
  require(twitteR)
  rdmTweets = searchTwitter(searchTerm, n=num)
  tw.df=twListToDF(rdmTweets)
  as.vector(sapply(tw.df$text, RemoveAtPeople))
}
#Then we could do something like:
tweets=tweets.grabber('ukgc12')
wordcloud.generate(generateCorpus(tweets),3)
‘画卷フ 2024-09-10 18:57:44

我想回答你关于制作大词云的问题。
我所做的是

  1. 使用 s0.tweet <- searchTwitter(KEYWORD,n=1500) 7 天或更长时间,例如 这个

  2. 通过此命令组合它们:

rdmTweets = c(s0.tweet,s1.tweet,s2.tweet,s3.tweet,s4.tweet,s5.tweet,s6.tweet,s7.tweet)

结果:

Lynas Square Cloud

该 Square Cloud 包含约 9000 条推文。

资料来源:人们通过使用 R CloudStat 进行 Twitter 分析

希望有帮助!

I would like to answer your question in making big word cloud.
What I did is

  1. Use s0.tweet <- searchTwitter(KEYWORD,n=1500) for 7 days or more, such as THIS.

  2. Combine them by this command :

rdmTweets = c(s0.tweet,s1.tweet,s2.tweet,s3.tweet,s4.tweet,s5.tweet,s6.tweet,s7.tweet)

The result:

Lynas Square Cloud

This Square Cloud consists of about 9000 tweets.

Source: People voice about Lynas Malaysia through Twitter Analysis with R CloudStat

Hope it help!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文