错误“没有工作人员可用于服务请求：模型”在负载增加期间调用 SageMaker 端点时

发布于 2025-01-11 11:00:33 字数 2210 浏览 0 评论 0原文

我有一个自定义容器，它接受请求，进行一些特征提取，然后将增强的请求传递到分类器端点。在特征提取期间，会调用另一个端点来生成文本嵌入。我正在为我的嵌入模型使用 HuggingFace 估计器。

它一直工作正常，但请求增加了，看起来嵌入端点不知何故超时了。

我正在考虑添加自动缩放到端点，但我想确保我了解发生了什么并且它正确解决了问题。不幸的是，搜索此错误消息并没有得到太多结果。实例指标未显示端点过载 - CPU 利用率最大约为 30%。自动缩放会解决无人工作的问题吗？还是有什么不同？当时我每分钟收到几百个请求。

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/opt/program/predictor.py", line 56, in transformation
    result = preprocessor.transform(data)
  File "/opt/program/preprocessor.py", line 189, in transform
    response = embed_predictor.predict(data=json.dumps(payload))
  File "/usr/local/lib/python3.7/site-packages/sagemaker/predictor.py", line 136, in predict
    response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)


botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (503) from primary with message "{
  "code": 503,
  "type": "ServiceUnavailableException",
  "message": "No worker is available to serve request: model"
}

原文

I have a custom container that takes a request, does some feature extraction and then passes on the enhanced request to a classifier endpoint. During feature extraction another endpoint is being called for generating text embeddings. I am using the HuggingFace estimator for my embedding model.

It has been working fine, but there was an increase in requests and looks like the embedding endpoint timed out somehow.

I am looking at adding automatic scaling to the endpoint, but I want to make sure I understand what is happening and that it properly addresses the issue. Unfortunately searching for this error message does not pull up much. The instance metrics is not showing the endpoint to be overloaded - cpu utilization was max ~30%. Would auto scaling address the no worker issue or is this something different? I was receiving a few hundred requests per minute at the time.

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/opt/program/predictor.py", line 56, in transformation
    result = preprocessor.transform(data)
  File "/opt/program/preprocessor.py", line 189, in transform
    response = embed_predictor.predict(data=json.dumps(payload))
  File "/usr/local/lib/python3.7/site-packages/sagemaker/predictor.py", line 136, in predict
    response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)


botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (503) from primary with message "{
  "code": 503,
  "type": "ServiceUnavailableException",
  "message": "No worker is available to serve request: model"
}

分享到QQ

分享到微博