朴素贝叶斯分类器不学习

发布于 2025-01-11 14:53:32 字数 1464 浏览 0 评论 0原文

我是编程语言 R 的新手。

我想建立一个朴素贝叶斯分类器,它将活动的描述分类为 0 或 1(取决于活动是否成功)。

该数据集可以在此处找到。

我的代码如下:

library(tidyverse)
library(tidymodels)
library(textrecipes)
library(discrim)

df <- read_csv("data/kickstarter.csv.gz")

# create categorical from numerical data
df$state <- as.factor(df$state)

# do not use the whole data frame
df <- df %>% slice(1:1e5)
df <- filter(df, nchar(blurb) >= 15)

# split into training and test set
df_split <- initial_split(df)
df_train <- training(df_split)
df_test <- testing(df_split)

# create folds for cross validation
folds <- vfold_cv(df_train)

# pre-process texts
rec <- recipe(state ~ blurb, data = df) %>%
  step_tokenize(blurb) %>%
  step_tokenfilter(blurb, max_tokens = 1e3)

# transform to numerical data
rec <- rec %>% step_tfidf(blurb)

# specify model
nb_spec <- naive_Bayes() %>%
  set_mode("classification") %>%
  set_engine("naivebayes")

# create workflow
nb_wf <- workflow() %>%
  add_recipe(rec) %>%
  add_model(nb_spec)

# fit & do cross validation
nb_rs <- fit_resamples(
  nb_wf,
  folds,
  control = control_resamples(save_pred = TRUE)
)

# look at accuracy
nb_rs_metrics <- collect_metrics(nb_rs)
nb_rs_metrics

事实证明,分类器的准确率只有0.52。但是,我不知道如何解决这个问题。有谁知道我的错误可能出在哪里?

已经谢谢你了!

I am new to the programming language R.

I want to set up a Naïve Bayes classifier, which classifies descriptions of campaigns as 0 or 1 (depending on whether the campaign was successful or not).

The data set can be found here.

My code is the following:

library(tidyverse)
library(tidymodels)
library(textrecipes)
library(discrim)

df <- read_csv("data/kickstarter.csv.gz")

# create categorical from numerical data
df$state <- as.factor(df$state)

# do not use the whole data frame
df <- df %>% slice(1:1e5)
df <- filter(df, nchar(blurb) >= 15)

# split into training and test set
df_split <- initial_split(df)
df_train <- training(df_split)
df_test <- testing(df_split)

# create folds for cross validation
folds <- vfold_cv(df_train)

# pre-process texts
rec <- recipe(state ~ blurb, data = df) %>%
  step_tokenize(blurb) %>%
  step_tokenfilter(blurb, max_tokens = 1e3)

# transform to numerical data
rec <- rec %>% step_tfidf(blurb)

# specify model
nb_spec <- naive_Bayes() %>%
  set_mode("classification") %>%
  set_engine("naivebayes")

# create workflow
nb_wf <- workflow() %>%
  add_recipe(rec) %>%
  add_model(nb_spec)

# fit & do cross validation
nb_rs <- fit_resamples(
  nb_wf,
  folds,
  control = control_resamples(save_pred = TRUE)
)

# look at accuracy
nb_rs_metrics <- collect_metrics(nb_rs)
nb_rs_metrics

It turns out that the accuracy of the classifier is only 0.52. However, I have no idea how I can access this problem. Does anyone have an idea where my mistake could be?

Thank you already!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文