朴素贝叶斯分类器不学习

发布于 2025-01-11 14:53:32 字数 1464 浏览 0 评论 0原文

我是编程语言 R 的新手。

我想建立一个朴素贝叶斯分类器，它将活动的描述分类为 0 或 1（取决于活动是否成功）。

该数据集可以在此处找到。

我的代码如下：

library(tidyverse)
library(tidymodels)
library(textrecipes)
library(discrim)

df <- read_csv("data/kickstarter.csv.gz")

# create categorical from numerical data
df$state <- as.factor(df$state)

# do not use the whole data frame
df <- df %>% slice(1:1e5)
df <- filter(df, nchar(blurb) >= 15)

# split into training and test set
df_split <- initial_split(df)
df_train <- training(df_split)
df_test <- testing(df_split)

# create folds for cross validation
folds <- vfold_cv(df_train)

# pre-process texts
rec <- recipe(state ~ blurb, data = df) %>%
  step_tokenize(blurb) %>%
  step_tokenfilter(blurb, max_tokens = 1e3)

# transform to numerical data
rec <- rec %>% step_tfidf(blurb)

# specify model
nb_spec <- naive_Bayes() %>%
  set_mode("classification") %>%
  set_engine("naivebayes")

# create workflow
nb_wf <- workflow() %>%
  add_recipe(rec) %>%
  add_model(nb_spec)

# fit & do cross validation
nb_rs <- fit_resamples(
  nb_wf,
  folds,
  control = control_resamples(save_pred = TRUE)
)

# look at accuracy
nb_rs_metrics <- collect_metrics(nb_rs)
nb_rs_metrics

事实证明，分类器的准确率只有0.52。但是，我不知道如何解决这个问题。有谁知道我的错误可能出在哪里？

已经谢谢你了！

原文

I am new to the programming language R.

I want to set up a Naïve Bayes classifier, which classifies descriptions of campaigns as 0 or 1 (depending on whether the campaign was successful or not).

The data set can be found here.

My code is the following:

library(tidyverse)
library(tidymodels)
library(textrecipes)
library(discrim)

df <- read_csv("data/kickstarter.csv.gz")

# create categorical from numerical data
df$state <- as.factor(df$state)

# do not use the whole data frame
df <- df %>% slice(1:1e5)
df <- filter(df, nchar(blurb) >= 15)

# split into training and test set
df_split <- initial_split(df)
df_train <- training(df_split)
df_test <- testing(df_split)

# create folds for cross validation
folds <- vfold_cv(df_train)

# pre-process texts
rec <- recipe(state ~ blurb, data = df) %>%
  step_tokenize(blurb) %>%
  step_tokenfilter(blurb, max_tokens = 1e3)

# transform to numerical data
rec <- rec %>% step_tfidf(blurb)

# specify model
nb_spec <- naive_Bayes() %>%
  set_mode("classification") %>%
  set_engine("naivebayes")

# create workflow
nb_wf <- workflow() %>%
  add_recipe(rec) %>%
  add_model(nb_spec)

# fit & do cross validation
nb_rs <- fit_resamples(
  nb_wf,
  folds,
  control = control_resamples(save_pred = TRUE)
)

# look at accuracy
nb_rs_metrics <- collect_metrics(nb_rs)
nb_rs_metrics

It turns out that the accuracy of the classifier is only 0.52. However, I have no idea how I can access this problem. Does anyone have an idea where my mistake could be?

Thank you already!

分享到QQ

分享到微博