使用 Spacy 的电子邮件分类器,在尝试实现 BOW 时由于版本问题引发以下错误

发布于 2025-01-11 21:26:24 字数 8069 浏览 1 评论 0原文

我正在尝试创建具有专有类和“弓”架构的 TextCategorizer,但由于版本问题,它抛出以下错误,我的 python 版本是 3.8,我的 spacy 版本是 3.2.3,请有人帮助我解决这个问题

######## Main method ########

def main():

    # Load dataset
    data = pd.read_csv(data_path, sep='\t')
    observations = len(data.index)
    # print("Dataset Size: {}".format(observations))

    # Create an empty spacy model
    nlp = spacy.blank("en")

    # Create the TextCategorizer with exclusive classes and "bow" architecture
    text_cat = nlp.create_pipe(
                  "textcat",
                  config={
                    "exclusive_classes": True,
                    "architecture": "bow"})

    # Adding the TextCategorizer to the created empty model
    nlp.add_pipe(text_cat)

    # Add labels to text classifier
    text_cat.add_label("ham")
    text_cat.add_label("spam")

    # Split data into train and test datasets
    x_train, x_test, y_train, y_test = train_test_split(
        data['text'], data['label'], test_size=0.33, random_state=7)

    # Create the train and test data for the spacy model
    train_lables = [{'cats': {'ham': label == 'ham',
                              'spam': label == 'spam'}}  for label in y_train]
    test_lables = [{'cats': {'ham': label == 'ham',
                          'spam': label == 'spam'}}  for label in y_test]

    # Spacy model data
    train_data = list(zip(x_train, train_lables))
    test_data = list(zip(x_test, test_lables))

    # Model configurations
    optimizer = nlp.begin_training()
    batch_size = 5
    epochs = 10

    # Training the model
    train_model(nlp, train_data, optimizer, batch_size, epochs)

    # Sample predictions
    # print(train_data[0])
    # sample_test = nlp(train_data[0][0])
    # print(sample_test.cats)

    # Train and test accuracy
    train_predictions = get_predictions(nlp, x_train)
    test_predictions = get_predictions(nlp, x_test)
    train_accuracy = accuracy_score(y_train, train_predictions)
    test_accuracy = accuracy_score(y_test, test_predictions)

    print("Train accuracy: {}".format(train_accuracy))
    print("Test accuracy: {}".format(test_accuracy))

    # Creating the confusion matrix graphs
    cf_train_matrix = confusion_matrix(y_train, train_predictions)
    plt.figure(figsize=(10,8))
    sns.heatmap(cf_train_matrix, annot=True, fmt='d')

    cf_test_matrix = confusion_matrix(y_test, test_predictions)
    plt.figure(figsize=(10,8))
    sns.heatmap(cf_test_matrix, annot=True, fmt='d')


if __name__ == "__main__":
    main()

错误

---------------------------------------------------------------------------
ConfigValidationError                     Traceback (most recent call last)
<ipython-input-6-a77bb5692b25> in <module>
     72 
     73 if __name__ == "__main__":
---> 74     main()

<ipython-input-6-a77bb5692b25> in main()
     12 
     13     # Create the TextCategorizer with exclusive classes and "bow" architecture
---> 14     text_cat = nlp.add_pipe(
     15                   "textcat",
     16                   config={

~\anaconda3\lib\site-packages\spacy\language.py in add_pipe(self, factory_name, name, before, after, first, last, source, config, raw_config, validate)
    790                     lang_code=self.lang,
    791                 )
--> 792             pipe_component = self.create_pipe(
    793                 factory_name,
    794                 name=name,

~\anaconda3\lib\site-packages\spacy\language.py in create_pipe(self, factory_name, name, config, raw_config, validate)
    672         # We're calling the internal _fill here to avoid constructing the
    673         # registered functions twice
--> 674         resolved = registry.resolve(cfg, validate=validate)
    675         filled = registry.fill({"cfg": cfg[factory_name]}, validate=validate)["cfg"]
    676         filled = Config(filled)

~\anaconda3\lib\site-packages\thinc\config.py in resolve(cls, config, schema, overrides, validate)
    727         validate: bool = True,
    728     ) -> Dict[str, Any]:
--> 729         resolved, _ = cls._make(
    730             config, schema=schema, overrides=overrides, validate=validate, resolve=True
    731         )

~\anaconda3\lib\site-packages\thinc\config.py in _make(cls, config, schema, overrides, resolve, validate)
    776         if not is_interpolated:
    777             config = Config(orig_config).interpolate()
--> 778         filled, _, resolved = cls._fill(
    779             config, schema, validate=validate, overrides=overrides, resolve=resolve
    780         )

~\anaconda3\lib\site-packages\thinc\config.py in _fill(cls, config, schema, validate, resolve, parent, overrides)
    831                     schema.__fields__[key] = copy_model_field(field, Any)
    832                 promise_schema = cls.make_promise_schema(value, resolve=resolve)
--> 833                 filled[key], validation[v_key], final[key] = cls._fill(
    834                     value,
    835                     promise_schema,

~\anaconda3\lib\site-packages\thinc\config.py in _fill(cls, config, schema, validate, resolve, parent, overrides)
    897                 result = schema.parse_obj(validation)
    898             except ValidationError as e:
--> 899                 raise ConfigValidationError(
    900                     config=config, errors=e.errors(), parent=parent
    901                 ) from None

ConfigValidationError: 

Config validation error

textcat -> architecture        extra fields not permitted
textcat -> exclusive_classes   extra fields not permitted

{'nlp': <spacy.lang.en.English object at 0x000001B90CD4BF70>, 'name': 'textcat', 'architecture': 'bow', 'exclusive_classes': True, 'model': {'@architectures': 'spacy.TextCatEnsemble.v2', 'linear_model': {'@architectures': 'spacy.TextCatBOW.v2', 'exclusive_classes': True, 'ngram_size': 1, 'no_output_layer': False}, 'tok2vec': {'@architectures': 'spacy.Tok2Vec.v2', 'embed': {'@architectures': 'spacy.MultiHashEmbed.v2', 'width': 64, 'rows': [2000, 2000, 1000, 1000, 1000, 1000], 'attrs': ['ORTH', 'LOWER', 'PREFIX', 'SUFFIX', 'SHAPE', 'ID'], 'include_static_vectors': False}, 'encode': {'@architectures': 'spacy.MaxoutWindowEncoder.v2', 'width': 64, 'window_size': 1, 'maxout_pieces': 3, 'depth': 2}}}, 'scorer': {'@scorers': 'spacy.textcat_scorer.v1'}, 'threshold': 0.5, '@factories': 'textcat'}

以下是我的 Spacy 版本的

print(spacy.__version__)

3.2.3

我的Python版本

import sys
print(sys.version)

3.8.8(默认,2021 年 4 月 13 日,15:08:03)[MSC v.1916 64 位 (AMD64)]

尝试降级Spacy版本

!conda install -c conda-forge spacy = 2.1.8
正在收集包元数据 (current_repodata.json):...正在工作...已完成
求解环境:...正在工作...因初始冻结求解而失败。使用灵活的解决方案重试。
收集包元数据(repodata.json):...正在工作...完成
求解环境:...正在工作...因初始冻结求解而失败。使用灵活的解决方案重试。
解决环境: ...工作... 

部门构建图:0%| | 0/5 [00:00 python[版本='>=3.6,<3.7.0a0|>=3.7,<3.8.0a0']

你的Python:python=3.8

发现冲突!寻找不兼容的包。
这可能需要几分钟的时间。按 CTRL-C 中止。
失败的


如果 python 位于链的最左侧,则这就是您要求的版本。
当 python 出现在右边时,表明左边的东西是某种
不适用于您受限的 python 版本。请注意,conda 不会
将您的 python 版本更改为不同的次要版本,除非您明确指定
那。

请随时发表评论或询问。 谢谢

I'm trying to Create the TextCategorizer with exclusive classes and "bow" architecture but its throwing the below error due to version issue and my python version is 3.8 ,also my spacy version is 3.2.3 , please some one help me in resolving this

######## Main method ########

def main():

    # Load dataset
    data = pd.read_csv(data_path, sep='\t')
    observations = len(data.index)
    # print("Dataset Size: {}".format(observations))

    # Create an empty spacy model
    nlp = spacy.blank("en")

    # Create the TextCategorizer with exclusive classes and "bow" architecture
    text_cat = nlp.create_pipe(
                  "textcat",
                  config={
                    "exclusive_classes": True,
                    "architecture": "bow"})

    # Adding the TextCategorizer to the created empty model
    nlp.add_pipe(text_cat)

    # Add labels to text classifier
    text_cat.add_label("ham")
    text_cat.add_label("spam")

    # Split data into train and test datasets
    x_train, x_test, y_train, y_test = train_test_split(
        data['text'], data['label'], test_size=0.33, random_state=7)

    # Create the train and test data for the spacy model
    train_lables = [{'cats': {'ham': label == 'ham',
                              'spam': label == 'spam'}}  for label in y_train]
    test_lables = [{'cats': {'ham': label == 'ham',
                          'spam': label == 'spam'}}  for label in y_test]

    # Spacy model data
    train_data = list(zip(x_train, train_lables))
    test_data = list(zip(x_test, test_lables))

    # Model configurations
    optimizer = nlp.begin_training()
    batch_size = 5
    epochs = 10

    # Training the model
    train_model(nlp, train_data, optimizer, batch_size, epochs)

    # Sample predictions
    # print(train_data[0])
    # sample_test = nlp(train_data[0][0])
    # print(sample_test.cats)

    # Train and test accuracy
    train_predictions = get_predictions(nlp, x_train)
    test_predictions = get_predictions(nlp, x_test)
    train_accuracy = accuracy_score(y_train, train_predictions)
    test_accuracy = accuracy_score(y_test, test_predictions)

    print("Train accuracy: {}".format(train_accuracy))
    print("Test accuracy: {}".format(test_accuracy))

    # Creating the confusion matrix graphs
    cf_train_matrix = confusion_matrix(y_train, train_predictions)
    plt.figure(figsize=(10,8))
    sns.heatmap(cf_train_matrix, annot=True, fmt='d')

    cf_test_matrix = confusion_matrix(y_test, test_predictions)
    plt.figure(figsize=(10,8))
    sns.heatmap(cf_test_matrix, annot=True, fmt='d')


if __name__ == "__main__":
    main()

Below is the error

---------------------------------------------------------------------------
ConfigValidationError                     Traceback (most recent call last)
<ipython-input-6-a77bb5692b25> in <module>
     72 
     73 if __name__ == "__main__":
---> 74     main()

<ipython-input-6-a77bb5692b25> in main()
     12 
     13     # Create the TextCategorizer with exclusive classes and "bow" architecture
---> 14     text_cat = nlp.add_pipe(
     15                   "textcat",
     16                   config={

~\anaconda3\lib\site-packages\spacy\language.py in add_pipe(self, factory_name, name, before, after, first, last, source, config, raw_config, validate)
    790                     lang_code=self.lang,
    791                 )
--> 792             pipe_component = self.create_pipe(
    793                 factory_name,
    794                 name=name,

~\anaconda3\lib\site-packages\spacy\language.py in create_pipe(self, factory_name, name, config, raw_config, validate)
    672         # We're calling the internal _fill here to avoid constructing the
    673         # registered functions twice
--> 674         resolved = registry.resolve(cfg, validate=validate)
    675         filled = registry.fill({"cfg": cfg[factory_name]}, validate=validate)["cfg"]
    676         filled = Config(filled)

~\anaconda3\lib\site-packages\thinc\config.py in resolve(cls, config, schema, overrides, validate)
    727         validate: bool = True,
    728     ) -> Dict[str, Any]:
--> 729         resolved, _ = cls._make(
    730             config, schema=schema, overrides=overrides, validate=validate, resolve=True
    731         )

~\anaconda3\lib\site-packages\thinc\config.py in _make(cls, config, schema, overrides, resolve, validate)
    776         if not is_interpolated:
    777             config = Config(orig_config).interpolate()
--> 778         filled, _, resolved = cls._fill(
    779             config, schema, validate=validate, overrides=overrides, resolve=resolve
    780         )

~\anaconda3\lib\site-packages\thinc\config.py in _fill(cls, config, schema, validate, resolve, parent, overrides)
    831                     schema.__fields__[key] = copy_model_field(field, Any)
    832                 promise_schema = cls.make_promise_schema(value, resolve=resolve)
--> 833                 filled[key], validation[v_key], final[key] = cls._fill(
    834                     value,
    835                     promise_schema,

~\anaconda3\lib\site-packages\thinc\config.py in _fill(cls, config, schema, validate, resolve, parent, overrides)
    897                 result = schema.parse_obj(validation)
    898             except ValidationError as e:
--> 899                 raise ConfigValidationError(
    900                     config=config, errors=e.errors(), parent=parent
    901                 ) from None

ConfigValidationError: 

Config validation error

textcat -> architecture        extra fields not permitted
textcat -> exclusive_classes   extra fields not permitted

{'nlp': <spacy.lang.en.English object at 0x000001B90CD4BF70>, 'name': 'textcat', 'architecture': 'bow', 'exclusive_classes': True, 'model': {'@architectures': 'spacy.TextCatEnsemble.v2', 'linear_model': {'@architectures': 'spacy.TextCatBOW.v2', 'exclusive_classes': True, 'ngram_size': 1, 'no_output_layer': False}, 'tok2vec': {'@architectures': 'spacy.Tok2Vec.v2', 'embed': {'@architectures': 'spacy.MultiHashEmbed.v2', 'width': 64, 'rows': [2000, 2000, 1000, 1000, 1000, 1000], 'attrs': ['ORTH', 'LOWER', 'PREFIX', 'SUFFIX', 'SHAPE', 'ID'], 'include_static_vectors': False}, 'encode': {'@architectures': 'spacy.MaxoutWindowEncoder.v2', 'width': 64, 'window_size': 1, 'maxout_pieces': 3, 'depth': 2}}}, 'scorer': {'@scorers': 'spacy.textcat_scorer.v1'}, 'threshold': 0.5, '@factories': 'textcat'}

My Spacy-Version

print(spacy.__version__)

3.2.3

My Python Version

import sys
print(sys.version)

3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]

Tring to downgrade the Spacy-Version

!conda install -c conda-forge spacy = 2.1.8
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... 

Building graph of deps:   0%|          | 0/5 [00:00<?, ?it/s]
Examining spacy=2.1.8:   0%|          | 0/5 [00:00<?, ?it/s] 
Examining python=3.8:  20%|##        | 1/5 [00:00<00:00,  4.80it/s]
Examining python=3.8:  40%|####      | 2/5 [00:00<00:00,  9.60it/s]
Examining @/win-64::__cuda==11.6=0:  40%|####      | 2/5 [00:01<00:00,  9.60it/s]
Examining @/win-64::__cuda==11.6=0:  60%|######    | 3/5 [00:01<00:01,  1.97it/s]
Examining @/win-64::__win==0=0:  60%|######    | 3/5 [00:01<00:01,  1.97it/s]    
Examining @/win-64::__archspec==1=x86_64:  80%|########  | 4/5 [00:01<00:00,  1.97it/s]
                                                                                       

Determining conflicts:   0%|          | 0/5 [00:00<?, ?it/s]
Examining conflict for spacy python:   0%|          | 0/5 [00:00<?, ?it/s]
                                                                          

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - spacy=2.1.8 -> python[version='>=3.6,<3.7.0a0|>=3.7,<3.8.0a0']

Your python: python=3.8

Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed


If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.

Please feel free to comment or ask .
Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我一向站在原地 2025-01-18 21:26:24

从我理解的错误消息来看,它告诉你你想要安装的 spacy 版本 (2.1.8) 与你拥有的 python 版本 (3.8.8) 不兼容。它需要 Python 3.6 或 3.7。

因此,要么使用Python 3.6或3.7创建环境(在conda中创建新环境时指定Python版本非常容易),要么使用更高版本的spacy。如果您只使用最新版本的 spacy,您是否已经尝试过代码是否有效?

您使用这个 spacy 版本有什么具体原因吗?如果您使用的某些方法不再受支持,那么将代码更新为较新的 spacy 方法可能更有意义。特别是如果您这样做是为了了解 spacy,那么学习不再支持的方法会适得其反。遗憾的是,许多教程既没有更新他们的代码,也没有至少指定他们正在使用的版本,然后将他们的代码留在网上多年。

Just from the way I would understand that error message it tells you that the spacy version you want to install (2.1.8) is incompatible with the python version you have (3.8.8). It needs Python 3.6 or 3.7.

So either create an environment with Python 3.6 or 3.7 (its quite easy to specify Python version when creating a new environment in conda) or use a higher version of spacy. Did you already try if the code works if you just use the newest version of spacy?

Is there a specific reason for why you are using this spacy version? If you are using some methods that are not supported anymore it might make more sense to update your code to the newer spacy methods. Especially if you are doing this to learn about spacy it is counterproductive to learn methods that are not supported anymore. Sadly a lot of tutorials fail to either update their code or at least specify what versions they are using and then leave their code online for years.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文