r情感功能

发布于 2025-01-29 21:21:33 字数 2637 浏览 1 评论 0原文

我对此代码有问题。我找不到错误，但是结果显然是不正确的。调用功能后：


    data=sentimentfunction(My_tweettext, positive_war, negative_war, 
    .progress='text')

我明白了：结果的

结果是带有下载推文的DF（清洁已完成），情感功能的第二个结果是最大值。 3个相同的推文，我们得到2倍分数= 0，1倍得分= 4771。

比我更聪明地查看此代码并检查正确性吗？建议我如何获得正确的结果？我想使用我已经获得的“ TweetText”。


    sentimentfun = function(tweettext, pos, neg, .progress='non')
    {
      # Parameters
      # tweettext: vector of text to score
      # pos: vector of words of postive sentiment
      # neg: vector of words of negative sentiment
      # .progress: passed to laply() 4 control of progress bar
      
      scores = laply(tweettext,
                     function(singletweet, pos, neg)
                     {
                       singletweet = gsub("[[:punct:]]", "", singletweet)
                       singletweet = gsub("[[:cntrl:]]", "", singletweet)
                       singletweet = gsub("\\d+", "", singletweet)
    
                       tryTolower = function(x)
                       {
                         y = NA
                         try_error = tryCatch(tolower(x), error=function(e)e)
                         if (!inherits(try_error, "error"))
                           y = tolower(x)
                         return(y)
                       }
                       singletweet = sapply(singletweet, tryTolower)
                       word.list = str_split(singletweet, "\\s+")
                       words = unlist(word.list)
                       pos.matches = match(words, pos)
                       neg.matches = match(words, neg)
                       pos.matches = !is.na(pos.matches)
                       neg.matches = !is.na(neg.matches)
                       score = sum(pos.matches) - sum(neg.matches)
                       return(score)
                     }, pos, neg, .progress=.progress )
      sentiment.df = data.frame(text=tweettext, score=scores)
      return(sentiment.df)
    }

抱歉，如果这个问题很愚蠢，但是我需要此功能才能获取我的研究数据。

编辑：我使用Windows 10我的rstudio版本是1.4.1103

tweettext: (Trudeau_tweettext.csv)
pos: (positive-words.txt)
neg: (negative-words.txt)

    library(stringr)
    library(plyr)
    library(dplyr)
    library(tm)

祝大家有美好的一天（或晚上）！

原文

I have a problem with this piece of code. I can't find a bug, but the results are clearly incorrect. After calling the function:


    data=sentimentfunction(My_tweettext, positive_war, negative_war, 
    .progress='text')

I get this:
ss of result

The result is a df with downloaded tweets (cleaning has been done), where every second result of the sentiment function is the maximum. 3 identical tweets, we get 2x score = 0 and 1x score = 4771.

Could someone smarter than me look at this code and check it for correctness? Suggest how I can get the correct results? I want to use the "tweettext" that I have already obtained.


    sentimentfun = function(tweettext, pos, neg, .progress='non')
    {
      # Parameters
      # tweettext: vector of text to score
      # pos: vector of words of postive sentiment
      # neg: vector of words of negative sentiment
      # .progress: passed to laply() 4 control of progress bar
      
      scores = laply(tweettext,
                     function(singletweet, pos, neg)
                     {
                       singletweet = gsub("[[:punct:]]", "", singletweet)
                       singletweet = gsub("[[:cntrl:]]", "", singletweet)
                       singletweet = gsub("\\d+", "", singletweet)
    
                       tryTolower = function(x)
                       {
                         y = NA
                         try_error = tryCatch(tolower(x), error=function(e)e)
                         if (!inherits(try_error, "error"))
                           y = tolower(x)
                         return(y)
                       }
                       singletweet = sapply(singletweet, tryTolower)
                       word.list = str_split(singletweet, "\\s+")
                       words = unlist(word.list)
                       pos.matches = match(words, pos)
                       neg.matches = match(words, neg)
                       pos.matches = !is.na(pos.matches)
                       neg.matches = !is.na(neg.matches)
                       score = sum(pos.matches) - sum(neg.matches)
                       return(score)
                     }, pos, neg, .progress=.progress )
      sentiment.df = data.frame(text=tweettext, score=scores)
      return(sentiment.df)
    }

Sorry, if this question is stupid, but I need this function to get data for my research.

Edit:
I use Windows 10 my RStudio version is 1.4.1103

Here is a folder with data

tweettext: (Trudeau_tweettext.csv)
pos: (positive-words.txt)
neg: (negative-words.txt)

    library(stringr)
    library(plyr)
    library(dplyr)
    library(tm)

I wish you all a lovely day (or night)!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

关于作者

淡淡離愁欲言轉身

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

r情感功能

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

r情感功能

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。