错误“没有工作人员可用于服务请求:模型”在负载增加期间调用 SageMaker 端点时

发布于 2025-01-11 11:00:33 字数 2210 浏览 0 评论 0原文

我有一个自定义容器,它接受请求,进行一些特征提取,然后将增强的请求传递到分类器端点。在特征提取期间,会调用另一个端点来生成文本嵌入。我正在为我的嵌入模型使用 HuggingFace 估计器

它一直工作正常,但请求增加了,看起来嵌入端点不知何故超时了。

我正在考虑添加 自动缩放到端点,但我想确保我了解发生了什么并且它正确解决了问题。不幸的是,搜索此错误消息并没有得到太多结果。实例指标未显示端点过载 - CPU 利用率最大约为 30%。自动缩放会解决无人工作的问题吗?还是有什么不同?当时我每分钟收到几百个请求。

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/opt/program/predictor.py", line 56, in transformation
    result = preprocessor.transform(data)
  File "/opt/program/preprocessor.py", line 189, in transform
    response = embed_predictor.predict(data=json.dumps(payload))
  File "/usr/local/lib/python3.7/site-packages/sagemaker/predictor.py", line 136, in predict
    response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)


botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (503) from primary with message "{
  "code": 503,
  "type": "ServiceUnavailableException",
  "message": "No worker is available to serve request: model"
}

I have a custom container that takes a request, does some feature extraction and then passes on the enhanced request to a classifier endpoint. During feature extraction another endpoint is being called for generating text embeddings. I am using the HuggingFace estimator for my embedding model.

It has been working fine, but there was an increase in requests and looks like the embedding endpoint timed out somehow.

I am looking at adding automatic scaling to the endpoint, but I want to make sure I understand what is happening and that it properly addresses the issue. Unfortunately searching for this error message does not pull up much. The instance metrics is not showing the endpoint to be overloaded - cpu utilization was max ~30%. Would auto scaling address the no worker issue or is this something different? I was receiving a few hundred requests per minute at the time.

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/opt/program/predictor.py", line 56, in transformation
    result = preprocessor.transform(data)
  File "/opt/program/preprocessor.py", line 189, in transform
    response = embed_predictor.predict(data=json.dumps(payload))
  File "/usr/local/lib/python3.7/site-packages/sagemaker/predictor.py", line 136, in predict
    response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)


botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (503) from primary with message "{
  "code": 503,
  "type": "ServiceUnavailableException",
  "message": "No worker is available to serve request: model"
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

不再见 2025-01-18 11:00:33

我建议确认 MemoryUtilization 没有被淹没,并且 CloudWatch Logs 中也没有特定错误。

如果 MemoryUtilization 不堪重负,您可以测试配置 Auto Scaling,以便将请求负载分配到多个实例。话虽这么说,虽然我不确定您的自定义容器的详细信息,但我还建议确认容器本身可以处理多个并发请求(即有多个工作人员可用于处理请求)。

I would suggest confirming MemoryUtilization is not being overwhelmed and there is no specifc error in CloudWatch Logs as well.

If MemoryUtilization is overwhelmed, you can test configuring Auto Scaling in order to distribute the load of request to multiple instances. That being said, while I am not sure of the details of your custom container, I also recommend confirming the container itself can handle multiple concurrent requests (i.e have multiple workers available to serve requests).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文