如何为NER Spacy模型创建混乱矩阵？

发布于 2025-02-07 09:21:08 字数 2151 浏览 1 评论 0原文

我想为我的模型开发一个混乱矩阵，但我不确定如何使用它或使用哪种变量。由于我的模型具有两个功能用于训练，另一个用于测试，所以我不确定是否应该为两组结果或仅用于测试的混淆矩阵。

这是进行训练的功能，

def training(training_data, nlp, batch_size, iteration_index):
    # batch up the examples using spaCy's minibatch
    losses = {}
    batches = minibatch(training_data, size=batch_size)
    for batch in batches:
        for text, annotations in batch:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            # Update the model
            nlp.update([example], losses=losses, drop=0.3)
    mlflow.log_metrics(losses, step=iteration_index)
    return nlp

此功能使我可以测试模型

def testing(testing_data, nlp, iteration_index):
    testing_examples = []
    scorer_example = []
    for text, annotations in testing_data:
        doc = nlp.make_doc(text)
        testing_examples.append(Example.from_dict(doc, annotations))

    scorer_example = nlp.evaluate(testing_examples)
    del scorer_example["ents_per_type"]
    mlflow.log_metrics(scorer_example, step=iteration_index)

并最好地完成循环

with nlp.disable_pipes(*unaffected_pipes):
    with mlflow.start_run(experiment_id=experiment_id, run_name=model_name):
        mlflow.set_tag("model_flavor", model_name)
        mlflow.log_param("BATCH_SIZE", BATCH_SIZE)
        mlflow.log_param("TRAINING_ITERATION", TRAINING_ITERATION)
        mlflow.log_param("LABELS", CLASSES)
        mlflow.log_param("PIPE_NAMES", nlp.pipe_names)

        # Training for many iterations
        for iteration in tqdm(range(TRAINING_ITERATION)):
            
            # shuufling examples before every iteration
            random.shuffle(train_data)

            nlp = training(train_data, nlp, BATCH_SIZE, iteration)
            testing(test_data, nlp, iteration)

        #Save results in MLflow
        mlflow.log_artifact(local_path = './ner.ipynb')
        mlflow.spacy.log_model(spacy_model=nlp, artifact_path=str(train_name))

，我想使用Plotly，因为我已经有机会在另一个项目中使用它

原文

I want to develop a confusion matrix for my model, but I'm not sure how to go about it or which variable to use. Since my model has two functions one for training and the other for testing I'm not sure if I should make the confusion matrix for both sets of results or just for testing.

Here is the function to do the training

def training(training_data, nlp, batch_size, iteration_index):
    # batch up the examples using spaCy's minibatch
    losses = {}
    batches = minibatch(training_data, size=batch_size)
    for batch in batches:
        for text, annotations in batch:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            # Update the model
            nlp.update([example], losses=losses, drop=0.3)
    mlflow.log_metrics(losses, step=iteration_index)
    return nlp

This function allows me to test the model

def testing(testing_data, nlp, iteration_index):
    testing_examples = []
    scorer_example = []
    for text, annotations in testing_data:
        doc = nlp.make_doc(text)
        testing_examples.append(Example.from_dict(doc, annotations))

    scorer_example = nlp.evaluate(testing_examples)
    del scorer_example["ents_per_type"]
    mlflow.log_metrics(scorer_example, step=iteration_index)

And to complete the loop

with nlp.disable_pipes(*unaffected_pipes):
    with mlflow.start_run(experiment_id=experiment_id, run_name=model_name):
        mlflow.set_tag("model_flavor", model_name)
        mlflow.log_param("BATCH_SIZE", BATCH_SIZE)
        mlflow.log_param("TRAINING_ITERATION", TRAINING_ITERATION)
        mlflow.log_param("LABELS", CLASSES)
        mlflow.log_param("PIPE_NAMES", nlp.pipe_names)

        # Training for many iterations
        for iteration in tqdm(range(TRAINING_ITERATION)):
            
            # shuufling examples before every iteration
            random.shuffle(train_data)

            nlp = training(train_data, nlp, BATCH_SIZE, iteration)
            testing(test_data, nlp, iteration)

        #Save results in MLflow
        mlflow.log_artifact(local_path = './ner.ipynb')
        mlflow.spacy.log_model(spacy_model=nlp, artifact_path=str(train_name))

Preferably I would like to use Plotly because I already had the opportunity to use it in another project

分享到QQ

分享到微博