r情感功能

发布于 2025-01-29 21:21:33 字数 2637 浏览 1 评论 0原文

我对此代码有问题。我找不到错误,但是结果显然是不正确的。调用功能后:


    data=sentimentfunction(My_tweettext, positive_war, negative_war, 
    .progress='text')

我明白了: 结果的

结果是带有下载推文的DF(清洁已完成),情感功能的第二个结果是最大值。 3个相同的推文,我们得到2倍分数= 0,1倍得分= 4771。

比我更聪明地查看此代码并检查正确性吗?建议我如何获得正确的结果?我想使用我已经获得的“ TweetText”。


    sentimentfun = function(tweettext, pos, neg, .progress='non')
    {
      # Parameters
      # tweettext: vector of text to score
      # pos: vector of words of postive sentiment
      # neg: vector of words of negative sentiment
      # .progress: passed to laply() 4 control of progress bar
      
      scores = laply(tweettext,
                     function(singletweet, pos, neg)
                     {
                       singletweet = gsub("[[:punct:]]", "", singletweet)
                       singletweet = gsub("[[:cntrl:]]", "", singletweet)
                       singletweet = gsub("\\d+", "", singletweet)
    
                       tryTolower = function(x)
                       {
                         y = NA
                         try_error = tryCatch(tolower(x), error=function(e)e)
                         if (!inherits(try_error, "error"))
                           y = tolower(x)
                         return(y)
                       }
                       singletweet = sapply(singletweet, tryTolower)
                       word.list = str_split(singletweet, "\\s+")
                       words = unlist(word.list)
                       pos.matches = match(words, pos)
                       neg.matches = match(words, neg)
                       pos.matches = !is.na(pos.matches)
                       neg.matches = !is.na(neg.matches)
                       score = sum(pos.matches) - sum(neg.matches)
                       return(score)
                     }, pos, neg, .progress=.progress )
      sentiment.df = data.frame(text=tweettext, score=scores)
      return(sentiment.df)
    }

抱歉,如果这个问题很愚蠢,但是我需要此功能才能获取我的研究数据。

编辑: 我使用Windows 10我的rstudio版本是1.4.1103

tweettext: (Trudeau_tweettext.csv)
pos: (positive-words.txt)
neg: (negative-words.txt)
    library(stringr)
    library(plyr)
    library(dplyr)
    library(tm)

祝大家有美好的一天(或晚上)!

I have a problem with this piece of code. I can't find a bug, but the results are clearly incorrect. After calling the function:


    data=sentimentfunction(My_tweettext, positive_war, negative_war, 
    .progress='text')

I get this:
ss of result

The result is a df with downloaded tweets (cleaning has been done), where every second result of the sentiment function is the maximum. 3 identical tweets, we get 2x score = 0 and 1x score = 4771.

Could someone smarter than me look at this code and check it for correctness? Suggest how I can get the correct results? I want to use the "tweettext" that I have already obtained.


    sentimentfun = function(tweettext, pos, neg, .progress='non')
    {
      # Parameters
      # tweettext: vector of text to score
      # pos: vector of words of postive sentiment
      # neg: vector of words of negative sentiment
      # .progress: passed to laply() 4 control of progress bar
      
      scores = laply(tweettext,
                     function(singletweet, pos, neg)
                     {
                       singletweet = gsub("[[:punct:]]", "", singletweet)
                       singletweet = gsub("[[:cntrl:]]", "", singletweet)
                       singletweet = gsub("\\d+", "", singletweet)
    
                       tryTolower = function(x)
                       {
                         y = NA
                         try_error = tryCatch(tolower(x), error=function(e)e)
                         if (!inherits(try_error, "error"))
                           y = tolower(x)
                         return(y)
                       }
                       singletweet = sapply(singletweet, tryTolower)
                       word.list = str_split(singletweet, "\\s+")
                       words = unlist(word.list)
                       pos.matches = match(words, pos)
                       neg.matches = match(words, neg)
                       pos.matches = !is.na(pos.matches)
                       neg.matches = !is.na(neg.matches)
                       score = sum(pos.matches) - sum(neg.matches)
                       return(score)
                     }, pos, neg, .progress=.progress )
      sentiment.df = data.frame(text=tweettext, score=scores)
      return(sentiment.df)
    }

Sorry, if this question is stupid, but I need this function to get data for my research.

Edit:
I use Windows 10 my RStudio version is 1.4.1103

Here is a folder with data

tweettext: (Trudeau_tweettext.csv)
pos: (positive-words.txt)
neg: (negative-words.txt)
    library(stringr)
    library(plyr)
    library(dplyr)
    library(tm)

I wish you all a lovely day (or night)!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文