是否有一种方法可以在AWS SageMaker中将自定义回归指标包含在ModelQualityMonitor中？

发布于 2025-01-24 15:41:56 字数 2432 浏览 2 评论 0原文

我已经成功初始化了一个ModelQualityMonitor对象。然后，我使用CreateMonitoringsChedule API创建了监视时间表！在背景中，SageMaker经营两个处理作业，将地面真相数据与收集的端点数据合并，然后分析并创建预定义的回归指标： https://docs.aws。 amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html

不幸的是，我缺少指标中的Mape（平均绝对百分比错误）未来（也在CloudWatch中）。

SageMaker提供以下功能：

预处理和后处理：除了使用内置机制外，您还可以使用预处理和后处理脚本扩展代码。
带上自己的容器： Amazon SageMaker Model Monitor提供了一个预先构建的容器，具有分析从端点捕获的数据集捕获的数据。如果您想带上自己的容器，模型监视器提供了可以利用的扩展点。
带您自己的容器的CloudWatch指标

在此站点上记录了有关： https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-custom-custom-monitoring-schedules.html

与上述点？

这是我当前实施的代码段：

from sagemaker.model_monitor.model_monitoring import ModelQualityMonitor
from sagemaker.model_monitor import EndpointInput
from sagemaker.model_monitor.dataset_format import DatasetFormat

# Create the model quality monitoring object
MQM = ModelQualityMonitor(
    role=role,
    instance_count=1,
    instance_type="ml.m5.large",
    volume_size_in_gb=20,
    max_runtime_in_seconds=1800,
    sagemaker_session=sagemaker_session,
)

# suggest a baseline
job = MQM.suggest_baseline(
    job_name=baseline_job_name,
    baseline_dataset="./baseline.csv",
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=baseline_results_uri,
    problem_type="Regression",
    inference_attribute="predicted_price",
    ground_truth_attribute="price",
)
job.wait(logs=False)
baseline_job = MQM.latest_baselining_job

# create a monitoring schedule
endpointInput = EndpointInput(
    endpoint_name="dev-TestEndpoint",
    destination="/opt/ml/processing/input_data",
    inference_attribute="$.data.predicted_price"
)
MQM.create_monitoring_schedule(
    monitor_schedule_name="DS-Schedule",
    endpoint_input=endpointInput,
    output_s3_uri=baseline_results_uri,
    constraints=baseline_job.suggested_constraints(),
    problem_type="Regression",
    ground_truth_input=ground_truth_upload_path,
    schedule_cron_expression="cron(0 * ? * * *)", # hourly
    enable_cloudwatch_metrics=True
)

原文

I have successfully initialized a ModelQualityMonitor object.
Then I created a monitoring schedule using the CreateMonitoringSchedule API! In the background sagemaker runs two processing jobs which merges the ground truth data with the collected endpoint data and then analyzes and creates the predefined regression metrics:
https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html

Unfortunately, I am missing the MAPE (Mean Absolute Percentage Error) in the metrics, and would like to create this with in the future (also in CloudWatch).

Sagemaker provides the following functionalities:

Preprocessing and Postprocessing:
In addition to using the built-in mechanisms, you can extend the code with the preprocessing and postprocessing scripts.
Bring Your Own Containers:
Amazon SageMaker Model Monitor provides a prebuilt container with ability to analyze the data captured from endpoints for tabular datasets. If you would like to bring your own container, Model Monitor provides extension points which you can leverage.
CloudWatch Metrics for Bring Your Own Containers

Those points are documented on this site: https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-custom-monitoring-schedules.html

How exactly can I achieve my target of including MAPE with the above points?

Here is a code snippet of my current implementation:

from sagemaker.model_monitor.model_monitoring import ModelQualityMonitor
from sagemaker.model_monitor import EndpointInput
from sagemaker.model_monitor.dataset_format import DatasetFormat

# Create the model quality monitoring object
MQM = ModelQualityMonitor(
    role=role,
    instance_count=1,
    instance_type="ml.m5.large",
    volume_size_in_gb=20,
    max_runtime_in_seconds=1800,
    sagemaker_session=sagemaker_session,
)

# suggest a baseline
job = MQM.suggest_baseline(
    job_name=baseline_job_name,
    baseline_dataset="./baseline.csv",
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=baseline_results_uri,
    problem_type="Regression",
    inference_attribute="predicted_price",
    ground_truth_attribute="price",
)
job.wait(logs=False)
baseline_job = MQM.latest_baselining_job

# create a monitoring schedule
endpointInput = EndpointInput(
    endpoint_name="dev-TestEndpoint",
    destination="/opt/ml/processing/input_data",
    inference_attribute="$.data.predicted_price"
)
MQM.create_monitoring_schedule(
    monitor_schedule_name="DS-Schedule",
    endpoint_input=endpointInput,
    output_s3_uri=baseline_results_uri,
    constraints=baseline_job.suggested_constraints(),
    problem_type="Regression",
    ground_truth_input=ground_truth_upload_path,
    schedule_cron_expression="cron(0 * ? * * *)", # hourly
    enable_cloudwatch_metrics=True
)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

暮倦 2025-01-31 15:41:56

Amazon SageMaker Model Monitor仅支持定义定义的指标这里开箱即用。
如果您需要在您的情况下包含其他指标（平均绝对百分比错误），则必须依靠BYOC方法，请注意，使用此方法，您不能将“添加”度量指标添加到可用列表中，不幸的是您将拥有自己实施整个指标套件。我了解这对客户来说不是理想的选择，我鼓励您与您的AWS客户经理联系，以创建一个请求，以添加MAPE（平均绝对百分比错误），从长远来看。我也记下了这一点，并将其依靠回到团队中。

同时，您可以找到有关如何byoc 在这里。

我为AWS工作，但我的意见是我自己的。

谢谢，
拉古

回复收藏 0 原文

~没有更多了~