是否可以在HF/SageMaker上仅CPU机器上执行本地开发人员?
我正在尝试在sagemaker.huggingface.huggingface
上进行本地开发,然后再搬到SageMaker进行实际培训。我设置了
hf_estimator = huggingface(entry_point ='train.py',instance_type ='local'...)
hf_estimator.fit()
in in
train.py
我只是打印和退出以查看它是否有效。但是我遇到了这一点:
ValueError: Unsupported processor: cpu. You may need to upgrade your SDK version (pip install -U sagemaker) for newer processors. Supported processor(s): gpu.
是否有可能绕过这一点以进行本地发展?
I'm trying to dev locally on sagemaker.huggingface.HuggingFace
before moving to sagemaker for actual training. I set up a
HF_estimator = HuggingFace(entry_point='train.py', instance_type='local' ...)
And called HF_estimator.fit()
In train.py
im simply printing and exiting to see if it will work. However I ran into this:
ValueError: Unsupported processor: cpu. You may need to upgrade your SDK version (pip install -U sagemaker) for newer processors. Supported processor(s): gpu.
Is it possible to bypass this for local development?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
此错误发生在SDK试图查找合格的容器映像的那一刻,并发现(与其他框架不同,训练/docker/1.9/py3“ rel =“ nofollow noreferrer”>喜欢基础pytorch ),hf 仅提供 cuda-启用的DLC图像。
也许(我尚未检查过,但有兴趣知道),您实际上可以在docker本地运行GPU图像而没有问题?您可以尝试使用GPU image明确指定估算器的
image_uri
参数,并希望它运行可以 : /Sagemaker-python-sdk/blob/master/src/sagemaker/image_uri_config/huggingface.json“ rel =” nofollow noreferrer“> sagemaker sdk config file )。
另外,您可能只能将Pytorch框架使用用于本地开发的pytorch框架(或tensorflow,如果您使用huggingface tf) - 并在您的脚本捆绑包在您需要的版本中安装HF库。例如:
这将导致您的本地测试环境与真正的培训工作环境略有不同,但是希望在使用SageMaker进行实际的培训尝试之前,可以在代码中有足够有用的调试来对您的代码中的初始功能问题进行有用的调试。
This error happens at the point the SDK tries to look up an eligible container image and finds that (unlike other frameworks like base PyTorch), HF only offers CUDA-enabled DLC images.
Maybe (I haven't checked but would be interested to know), you could actually run the GPU image locally in Docker without issue? You could try explicitly specifying the
image_uri
parameter of your Estimator with the GPU image and hoping it runs okay:(For supported combinations can refer to the SageMaker SDK config file).
Alternatively, you could probably just use the PyTorch framework for your local development (or TensorFlow, if you're using HuggingFace TF) - and include a
requirements.txt
file in your script bundle to install HF libraries at the version(s) you need. For example:This would result in your local test environment being slightly different from the true training job environment, but hopefully close enough to be useful debugging initial functional issues in your code before using SageMaker for the actual training attempts.