萨格人笔记本中的模型推断期间的Unicode错误

发布于 2025-01-17 18:36:18 字数 2725 浏览 3 评论 0原文

我正在对 Sagemaker 笔记本中训练的模型进行推理。我在传递输入时收到 Unicode 错误。

在部署之前，我尝试了以下方法并且它有效 - 使用 input_fn 处理文本，然后将其输出传递给 Predict_fn 进行预测。但当我使用 sagemaker 端点的部署 fn 时，我遇到了问题。我该如何解决这个问题。

input_text = "BACKGROUND: COVID-19 is associated with pulmonary embolism (PE) in adults."
deployment.predict(json.dumps({"data":input_text}))

错误 回溯（最近一次调用最后一次）：文件“/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py”，第 93 行，在包装器中返回 fn(*args, **kwargs) 文件“ /opt/ml/code/train_nmf.py”，第 311 行，在 input_fn input_data = json.loads(serialized_input_data) 文件中“/miniconda3/lib/python3.7/json/__init__.py”，第 343 行，加载 s = s.decode(detect_encoding(s), 'surrogatepass')

Sagemaker 笔记本

from sagemaker.sklearn.estimator import SKLearn

script_path = 'train_nmf.py'

sklearn = SKLearn(
    entry_point=script_path,
    instance_type="ml.m4.xlarge",
    framework_version="0.23-1",
    py_version="py3",
    role=role,
    sagemaker_session=sagemaker_session,
    output_path=output_data_uri,
    code_location=training_desc_uri,
    source_dir='/home/ec2-user/SageMaker/src')

训练 NMF 代码中的训练

import os
import numpy as np
import pandas as pd
import joblib
import json
CONTENT_TYPE_JSON = "application/json" 

def process_text(text):
    text = [each.lower() for each in text]
    return text

def model_fn(model_dir):
    # SageMaker automatically load the model.tar.gz from the S3 and 
    # mount the folders inside the docker container. The  'model_dir'
    # points to the root of the extracted tar.gz file.
    model = joblib.load(os.path.join(model_dir, "nmf_model.pkl"))
    return model

def predict_fn(input_data, model):
    # Do your inference
    predicted_topics = model.transform(input_data)
    return predicted_topics

def input_fn(serialized_input_data, model_dir, content_type=CONTENT_TYPE_JSON):
    input_data = json.loads(serialized_input_data)
    input_text_processed = pd.Series(input_data).apply(process_text)
    tf_idf_model = joblib.load(os.path.join(model_dir, "tf_idf.pkl"))
    processed_sample_text = tf_idf_model.transform(input_text_processed)
    return processed_sample_text

def output_fn(prediction_output, model_dir, accept=CONTENT_TYPE_JSON):
    if accept == CONTENT_TYPE_JSON:
        topic_keywords = joblib.load(
            os.path.join(model_dir, "topic_keywords.pkl")
        )
        pred_dominant_topic = np.argmax(prediction_output, axis=1)
        pred_df = pd.DataFrame(prediction_output, columns=topic_keywords)
        pred_df["dominant_topic"] = pred_dominant_topic
        return json.dumps(pred_df.to_dict("records")), accept
    raise Exception('Unsupported content type')

原文

I am doing inference on a model trained in the sagemaker notebook. I am getting Unicode error while passing the input.

Before deploying, I tried the following and it worked - process the text with input_fn and then pass its output to predict_fn for prediction. But I am facing issue when I use the deploy fn of the sagemaker endpoint. How can I resolve this.

input_text = "BACKGROUND: COVID-19 is associated with pulmonary embolism (PE) in adults."
deployment.predict(json.dumps({"data":input_text}))

Error
Traceback (most recent call last): File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py", line 93, in wrapper return fn(*args, **kwargs) File "/opt/ml/code/train_nmf.py", line 311, in input_fn input_data = json.loads(serialized_input_data) File "/miniconda3/lib/python3.7/json/__init__.py", line 343, in loads s = s.decode(detect_encoding(s), 'surrogatepass')

Training in Sagemaker Notebook

from sagemaker.sklearn.estimator import SKLearn

script_path = 'train_nmf.py'

sklearn = SKLearn(
    entry_point=script_path,
    instance_type="ml.m4.xlarge",
    framework_version="0.23-1",
    py_version="py3",
    role=role,
    sagemaker_session=sagemaker_session,
    output_path=output_data_uri,
    code_location=training_desc_uri,
    source_dir='/home/ec2-user/SageMaker/src')

Train NMF Code

import os
import numpy as np
import pandas as pd
import joblib
import json
CONTENT_TYPE_JSON = "application/json" 

def process_text(text):
    text = [each.lower() for each in text]
    return text

def model_fn(model_dir):
    # SageMaker automatically load the model.tar.gz from the S3 and 
    # mount the folders inside the docker container. The  'model_dir'
    # points to the root of the extracted tar.gz file.
    model = joblib.load(os.path.join(model_dir, "nmf_model.pkl"))
    return model

def predict_fn(input_data, model):
    # Do your inference
    predicted_topics = model.transform(input_data)
    return predicted_topics

def input_fn(serialized_input_data, model_dir, content_type=CONTENT_TYPE_JSON):
    input_data = json.loads(serialized_input_data)
    input_text_processed = pd.Series(input_data).apply(process_text)
    tf_idf_model = joblib.load(os.path.join(model_dir, "tf_idf.pkl"))
    processed_sample_text = tf_idf_model.transform(input_text_processed)
    return processed_sample_text

def output_fn(prediction_output, model_dir, accept=CONTENT_TYPE_JSON):
    if accept == CONTENT_TYPE_JSON:
        topic_keywords = joblib.load(
            os.path.join(model_dir, "topic_keywords.pkl")
        )
        pred_dominant_topic = np.argmax(prediction_output, axis=1)
        pred_df = pd.DataFrame(prediction_output, columns=topic_keywords)
        pred_df["dominant_topic"] = pred_dominant_topic
        return json.dumps(pred_df.to_dict("records")), accept
    raise Exception('Unsupported content type')

分享到QQ

分享到微博