如何为NER Spacy模型创建混乱矩阵?

发布于 2025-02-07 09:21:08 字数 2151 浏览 1 评论 0原文

我想为我的模型开发一个混乱矩阵,但我不确定如何使用它或使用哪种变量。由于我的模型具有两个功能用于训练,另一个用于测试,所以我不确定是否应该为两组结果或仅用于测试的混淆矩阵。

这是进行训练的功能,

def training(training_data, nlp, batch_size, iteration_index):
    # batch up the examples using spaCy's minibatch
    losses = {}
    batches = minibatch(training_data, size=batch_size)
    for batch in batches:
        for text, annotations in batch:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            # Update the model
            nlp.update([example], losses=losses, drop=0.3)
    mlflow.log_metrics(losses, step=iteration_index)
    return nlp

此功能使我可以测试模型

def testing(testing_data, nlp, iteration_index):
    testing_examples = []
    scorer_example = []
    for text, annotations in testing_data:
        doc = nlp.make_doc(text)
        testing_examples.append(Example.from_dict(doc, annotations))

    scorer_example = nlp.evaluate(testing_examples)
    del scorer_example["ents_per_type"]
    mlflow.log_metrics(scorer_example, step=iteration_index)

并最好地完成循环

with nlp.disable_pipes(*unaffected_pipes):
    with mlflow.start_run(experiment_id=experiment_id, run_name=model_name):
        mlflow.set_tag("model_flavor", model_name)
        mlflow.log_param("BATCH_SIZE", BATCH_SIZE)
        mlflow.log_param("TRAINING_ITERATION", TRAINING_ITERATION)
        mlflow.log_param("LABELS", CLASSES)
        mlflow.log_param("PIPE_NAMES", nlp.pipe_names)

        # Training for many iterations
        for iteration in tqdm(range(TRAINING_ITERATION)):
            
            # shuufling examples before every iteration
            random.shuffle(train_data)

            nlp = training(train_data, nlp, BATCH_SIZE, iteration)
            testing(test_data, nlp, iteration)

        #Save results in MLflow
        mlflow.log_artifact(local_path = './ner.ipynb')
        mlflow.spacy.log_model(spacy_model=nlp, artifact_path=str(train_name))

,我想使用Plotly,因为我已经有机会在另一个项目中使用它

I want to develop a confusion matrix for my model, but I'm not sure how to go about it or which variable to use. Since my model has two functions one for training and the other for testing I'm not sure if I should make the confusion matrix for both sets of results or just for testing.

Here is the function to do the training

def training(training_data, nlp, batch_size, iteration_index):
    # batch up the examples using spaCy's minibatch
    losses = {}
    batches = minibatch(training_data, size=batch_size)
    for batch in batches:
        for text, annotations in batch:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            # Update the model
            nlp.update([example], losses=losses, drop=0.3)
    mlflow.log_metrics(losses, step=iteration_index)
    return nlp

This function allows me to test the model

def testing(testing_data, nlp, iteration_index):
    testing_examples = []
    scorer_example = []
    for text, annotations in testing_data:
        doc = nlp.make_doc(text)
        testing_examples.append(Example.from_dict(doc, annotations))

    scorer_example = nlp.evaluate(testing_examples)
    del scorer_example["ents_per_type"]
    mlflow.log_metrics(scorer_example, step=iteration_index)

And to complete the loop

with nlp.disable_pipes(*unaffected_pipes):
    with mlflow.start_run(experiment_id=experiment_id, run_name=model_name):
        mlflow.set_tag("model_flavor", model_name)
        mlflow.log_param("BATCH_SIZE", BATCH_SIZE)
        mlflow.log_param("TRAINING_ITERATION", TRAINING_ITERATION)
        mlflow.log_param("LABELS", CLASSES)
        mlflow.log_param("PIPE_NAMES", nlp.pipe_names)

        # Training for many iterations
        for iteration in tqdm(range(TRAINING_ITERATION)):
            
            # shuufling examples before every iteration
            random.shuffle(train_data)

            nlp = training(train_data, nlp, BATCH_SIZE, iteration)
            testing(test_data, nlp, iteration)

        #Save results in MLflow
        mlflow.log_artifact(local_path = './ner.ipynb')
        mlflow.spacy.log_model(spacy_model=nlp, artifact_path=str(train_name))

Preferably I would like to use Plotly because I already had the opportunity to use it in another project

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文