是否可以在HF/SageMaker上仅CPU机器上执行本地开发人员？

发布于 2025-02-07 03:48:20 字数 524 浏览 1 评论 0原文

我正在尝试在sagemaker.huggingface.huggingface上进行本地开发，然后再搬到SageMaker进行实际培训。我设置了

hf_estimator = huggingface（entry_point ='train.py'，instance_type ='local'...）

hf_estimator.fit（）

in in train.py我只是打印和退出以查看它是否有效。但是我遇到了这一点：

ValueError: Unsupported processor: cpu. You may need to upgrade your SDK version (pip install -U sagemaker) for newer processors. Supported processor(s): gpu.

是否有可能绕过这一点以进行本地发展？

原文

I'm trying to dev locally on sagemaker.huggingface.HuggingFace before moving to sagemaker for actual training. I set up a

HF_estimator = HuggingFace(entry_point='train.py', instance_type='local' ...)

And called HF_estimator.fit()

In train.py im simply printing and exiting to see if it will work. However I ran into this:

ValueError: Unsupported processor: cpu. You may need to upgrade your SDK version (pip install -U sagemaker) for newer processors. Supported processor(s): gpu.

Is it possible to bypass this for local development?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

奶气 2025-02-14 03:48:20

此错误发生在SDK试图查找合格的容器映像的那一刻，并发现（与其他框架不同，训练/docker/1.9/py3“ rel =“ nofollow noreferrer”>喜欢基础pytorch ），hf 仅提供 cuda-启用的DLC图像。

也许（我尚未检查过，但有兴趣知道），您实际上可以在docker本地运行GPU图像而没有问题？您可以尝试使用GPU image明确指定估算器的image_uri参数，并希望它运行

train_image_uri = sagemaker.image_uris.retrieve(
    framework="huggingface",
    region=your_region,  # e.g. "us-east-1"
    instance_type="ml.p3.2xlarge",  # -> GPU image
    py_version="py38",
    version="4.17",
    base_framework_version="pytorch1.10",
    image_scope="training",
)
estimator = HuggingFace(
    image_uri=train_image_uri,
    instance_type="local",
    ...
)

可以： /Sagemaker-python-sdk/blob/master/src/sagemaker/image_uri_config/huggingface.json“ rel =” nofollow noreferrer“> sagemaker sdk config file ）。

另外，您可能只能将Pytorch框架使用用于本地开发的pytorch框架（或tensorflow，如果您使用huggingface tf） - 并在您的脚本捆绑包在您需要的版本中安装HF库。例如：

# requirements.txt in the same source_dir folder as your train.py script

transformers[sklearn,sentencepiece]==4.17.0
datasets==1.18.4

这将导致您的本地测试环境与真正的培训工作环境略有不同，但是希望在使用SageMaker进行实际的培训尝试之前，可以在代码中有足够有用的调试来对您的代码中的初始功能问题进行有用的调试。

This error happens at the point the SDK tries to look up an eligible container image and finds that (unlike other frameworks like base PyTorch), HF only offers CUDA-enabled DLC images.

Maybe (I haven't checked but would be interested to know), you could actually run the GPU image locally in Docker without issue? You could try explicitly specifying the image_uri parameter of your Estimator with the GPU image and hoping it runs okay:

train_image_uri = sagemaker.image_uris.retrieve(
    framework="huggingface",
    region=your_region,  # e.g. "us-east-1"
    instance_type="ml.p3.2xlarge",  # -> GPU image
    py_version="py38",
    version="4.17",
    base_framework_version="pytorch1.10",
    image_scope="training",
)
estimator = HuggingFace(
    image_uri=train_image_uri,
    instance_type="local",
    ...
)

(For supported combinations can refer to the SageMaker SDK config file).

Alternatively, you could probably just use the PyTorch framework for your local development (or TensorFlow, if you're using HuggingFace TF) - and include a requirements.txt file in your script bundle to install HF libraries at the version(s) you need. For example:

# requirements.txt in the same source_dir folder as your train.py script

transformers[sklearn,sentencepiece]==4.17.0
datasets==1.18.4

This would result in your local test environment being slightly different from the true training job environment, but hopefully close enough to be useful debugging initial functional issues in your code before using SageMaker for the actual training attempts.

回复收藏 0 原文

~没有更多了~