萨格人笔记本中的模型推断期间的Unicode错误
我正在对 Sagemaker 笔记本中训练的模型进行推理。我在传递输入时收到 Unicode 错误。
在部署之前,我尝试了以下方法并且它有效 - 使用 input_fn 处理文本,然后将其输出传递给 Predict_fn 进行预测。但当我使用 sagemaker 端点的部署 fn 时,我遇到了问题。我该如何解决这个问题。
input_text = "BACKGROUND: COVID-19 is associated with pulmonary embolism (PE) in adults."
deployment.predict(json.dumps({"data":input_text}))
错误 回溯(最近一次调用最后一次):文件“/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py”,第 93 行,在包装器中返回 fn(*args, **kwargs) 文件“ /opt/ml/code/train_nmf.py”,第 311 行,在 input_fn input_data = json.loads(serialized_input_data) 文件中“/miniconda3/lib/python3.7/json/__init__.py”,第 343 行,加载 s = s.decode(detect_encoding(s), 'surrogatepass')
Sagemaker 笔记本
from sagemaker.sklearn.estimator import SKLearn
script_path = 'train_nmf.py'
sklearn = SKLearn(
entry_point=script_path,
instance_type="ml.m4.xlarge",
framework_version="0.23-1",
py_version="py3",
role=role,
sagemaker_session=sagemaker_session,
output_path=output_data_uri,
code_location=training_desc_uri,
source_dir='/home/ec2-user/SageMaker/src')
训练 NMF 代码中的训练
import os
import numpy as np
import pandas as pd
import joblib
import json
CONTENT_TYPE_JSON = "application/json"
def process_text(text):
text = [each.lower() for each in text]
return text
def model_fn(model_dir):
# SageMaker automatically load the model.tar.gz from the S3 and
# mount the folders inside the docker container. The 'model_dir'
# points to the root of the extracted tar.gz file.
model = joblib.load(os.path.join(model_dir, "nmf_model.pkl"))
return model
def predict_fn(input_data, model):
# Do your inference
predicted_topics = model.transform(input_data)
return predicted_topics
def input_fn(serialized_input_data, model_dir, content_type=CONTENT_TYPE_JSON):
input_data = json.loads(serialized_input_data)
input_text_processed = pd.Series(input_data).apply(process_text)
tf_idf_model = joblib.load(os.path.join(model_dir, "tf_idf.pkl"))
processed_sample_text = tf_idf_model.transform(input_text_processed)
return processed_sample_text
def output_fn(prediction_output, model_dir, accept=CONTENT_TYPE_JSON):
if accept == CONTENT_TYPE_JSON:
topic_keywords = joblib.load(
os.path.join(model_dir, "topic_keywords.pkl")
)
pred_dominant_topic = np.argmax(prediction_output, axis=1)
pred_df = pd.DataFrame(prediction_output, columns=topic_keywords)
pred_df["dominant_topic"] = pred_dominant_topic
return json.dumps(pred_df.to_dict("records")), accept
raise Exception('Unsupported content type')
I am doing inference on a model trained in the sagemaker notebook. I am getting Unicode error while passing the input.
Before deploying, I tried the following and it worked - process the text with input_fn and then pass its output to predict_fn for prediction. But I am facing issue when I use the deploy fn of the sagemaker endpoint. How can I resolve this.
input_text = "BACKGROUND: COVID-19 is associated with pulmonary embolism (PE) in adults."
deployment.predict(json.dumps({"data":input_text}))
ErrorTraceback (most recent call last): File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py", line 93, in wrapper return fn(*args, **kwargs) File "/opt/ml/code/train_nmf.py", line 311, in input_fn input_data = json.loads(serialized_input_data) File "/miniconda3/lib/python3.7/json/__init__.py", line 343, in loads s = s.decode(detect_encoding(s), 'surrogatepass')
Training in Sagemaker Notebook
from sagemaker.sklearn.estimator import SKLearn
script_path = 'train_nmf.py'
sklearn = SKLearn(
entry_point=script_path,
instance_type="ml.m4.xlarge",
framework_version="0.23-1",
py_version="py3",
role=role,
sagemaker_session=sagemaker_session,
output_path=output_data_uri,
code_location=training_desc_uri,
source_dir='/home/ec2-user/SageMaker/src')
Train NMF Code
import os
import numpy as np
import pandas as pd
import joblib
import json
CONTENT_TYPE_JSON = "application/json"
def process_text(text):
text = [each.lower() for each in text]
return text
def model_fn(model_dir):
# SageMaker automatically load the model.tar.gz from the S3 and
# mount the folders inside the docker container. The 'model_dir'
# points to the root of the extracted tar.gz file.
model = joblib.load(os.path.join(model_dir, "nmf_model.pkl"))
return model
def predict_fn(input_data, model):
# Do your inference
predicted_topics = model.transform(input_data)
return predicted_topics
def input_fn(serialized_input_data, model_dir, content_type=CONTENT_TYPE_JSON):
input_data = json.loads(serialized_input_data)
input_text_processed = pd.Series(input_data).apply(process_text)
tf_idf_model = joblib.load(os.path.join(model_dir, "tf_idf.pkl"))
processed_sample_text = tf_idf_model.transform(input_text_processed)
return processed_sample_text
def output_fn(prediction_output, model_dir, accept=CONTENT_TYPE_JSON):
if accept == CONTENT_TYPE_JSON:
topic_keywords = joblib.load(
os.path.join(model_dir, "topic_keywords.pkl")
)
pred_dominant_topic = np.argmax(prediction_output, axis=1)
pred_df = pd.DataFrame(prediction_output, columns=topic_keywords)
pred_df["dominant_topic"] = pred_dominant_topic
return json.dumps(pred_df.to_dict("records")), accept
raise Exception('Unsupported content type')
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论