朴素贝叶斯分类器不学习
我是编程语言 R 的新手。
我想建立一个朴素贝叶斯分类器,它将活动的描述分类为 0 或 1(取决于活动是否成功)。
该数据集可以在此处找到。
我的代码如下:
library(tidyverse)
library(tidymodels)
library(textrecipes)
library(discrim)
df <- read_csv("data/kickstarter.csv.gz")
# create categorical from numerical data
df$state <- as.factor(df$state)
# do not use the whole data frame
df <- df %>% slice(1:1e5)
df <- filter(df, nchar(blurb) >= 15)
# split into training and test set
df_split <- initial_split(df)
df_train <- training(df_split)
df_test <- testing(df_split)
# create folds for cross validation
folds <- vfold_cv(df_train)
# pre-process texts
rec <- recipe(state ~ blurb, data = df) %>%
step_tokenize(blurb) %>%
step_tokenfilter(blurb, max_tokens = 1e3)
# transform to numerical data
rec <- rec %>% step_tfidf(blurb)
# specify model
nb_spec <- naive_Bayes() %>%
set_mode("classification") %>%
set_engine("naivebayes")
# create workflow
nb_wf <- workflow() %>%
add_recipe(rec) %>%
add_model(nb_spec)
# fit & do cross validation
nb_rs <- fit_resamples(
nb_wf,
folds,
control = control_resamples(save_pred = TRUE)
)
# look at accuracy
nb_rs_metrics <- collect_metrics(nb_rs)
nb_rs_metrics
事实证明,分类器的准确率只有0.52。但是,我不知道如何解决这个问题。有谁知道我的错误可能出在哪里?
已经谢谢你了!
I am new to the programming language R.
I want to set up a Naïve Bayes classifier, which classifies descriptions of campaigns as 0 or 1 (depending on whether the campaign was successful or not).
The data set can be found here.
My code is the following:
library(tidyverse)
library(tidymodels)
library(textrecipes)
library(discrim)
df <- read_csv("data/kickstarter.csv.gz")
# create categorical from numerical data
df$state <- as.factor(df$state)
# do not use the whole data frame
df <- df %>% slice(1:1e5)
df <- filter(df, nchar(blurb) >= 15)
# split into training and test set
df_split <- initial_split(df)
df_train <- training(df_split)
df_test <- testing(df_split)
# create folds for cross validation
folds <- vfold_cv(df_train)
# pre-process texts
rec <- recipe(state ~ blurb, data = df) %>%
step_tokenize(blurb) %>%
step_tokenfilter(blurb, max_tokens = 1e3)
# transform to numerical data
rec <- rec %>% step_tfidf(blurb)
# specify model
nb_spec <- naive_Bayes() %>%
set_mode("classification") %>%
set_engine("naivebayes")
# create workflow
nb_wf <- workflow() %>%
add_recipe(rec) %>%
add_model(nb_spec)
# fit & do cross validation
nb_rs <- fit_resamples(
nb_wf,
folds,
control = control_resamples(save_pred = TRUE)
)
# look at accuracy
nb_rs_metrics <- collect_metrics(nb_rs)
nb_rs_metrics
It turns out that the accuracy of the classifier is only 0.52. However, I have no idea how I can access this problem. Does anyone have an idea where my mistake could be?
Thank you already!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论