导入已经下载的Spacy语言模型以docker容器,而无需新下载

发布于 2025-02-03 07:11:00 字数 494 浏览 5 评论 0原文

我想在各种Docker容器上运行多种Spacy语言模型。我不希望Docker映像包含行运行Python -M Spacy下载en_core_web_lg,因为其他过程可能具有不同的语言模型。

我的问题是:是否可以将多个Spacy语言模型下载到本地(即en_core_web_lg,en_core_web_md,...),然后在docker容器产生时加载这些模型?

此过程可能具有以下步骤:

  1. Spawn Docker容器并将卷“ Lanaging_models/”绑定到包含许多Spacy模型的容器。
  2. 运行某些spacy命令,例如python -m spacy下载-laguage_models/en_core_web_lg哪个指向您想要环境的语言模型。

希望是,由于语言模型已经存在于共享卷上,因此每个新容器的下载/导入时间大大减少。每个容器上也没有不必要的语言模型,并且Docker映像根本不会针对任何语言模型。

I'd like to run multiple spacy language models on various docker containers. I don't want the docker image to contain the line RUN python -m spacy download en_core_web_lg, as other processes might have different language models.

My question is: Is it possible to download multiple spacy language models onto local (i.e. en_core_web_lg, en_core_web_md, ...), and then load these models into the python-spacy environment when the docker container spawns?

This process might have the following steps:

  1. Spawn docker container and bind a volume "language_models/" to the container which contains a number of spacy models.
  2. Run some spacy command such as python -m spacy download --local ./language_models/en_core_web_lg which points at the language model which you want the environment to have.

The hope is that, since the language model already exists on the shared volume, the download/import time is significantly reduced for each new container. Each container also would not have unnecessary language models on it, and the Docker image would not be specific to any language models at all.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

九厘米的零° 2025-02-10 07:11:01

感谢您的评论 @polm23!由于Spacy模型最终被用来训练RASA模型,因此我还有一个额外的复杂性。我选择的解决方案是使用:

nlp = spacy.load(model)
nlp.to_disk(f'language_models/{model}')

然后使用安装的卷使Docker容器可见特定的模型目录。然后,在RASA中,您可以使用本地路径导入语言模型

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: "../../language_models/MODEL_NAME"
recipe: default.v1

Thanks for the comment @polm23! I had an additional layer of complexity since the SpaCy model was ultimately used to train a Rasa model. The solution I've opted for is to save models locally using:

nlp = spacy.load(model)
nlp.to_disk(f'language_models/{model}')

And then make the specific model directory visible to the docker container using a mounted volume. Then, in Rasa anyway, you can import the language model using a local path

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: "../../language_models/MODEL_NAME"
recipe: default.v1
独留℉清风醉 2025-02-10 07:11:00

有两种方法可以做到这一点。

更容易使用模型目录将音量安装在Docker中,并将其指定为路径。 spacy允许您调用spacy.load(“某些/路径”),因此不需要PIP安装。

如果您确实需要使用PIP安装某些内容,也可以下载zipped型号并安装该文件。但是,默认情况下可能涉及制作副本,从而降低收益。如果您可以使用pipe -e(可编辑)的模型下载和安装,通常用于DevelPoment。我不建议这样做,但是如果您使用导入en_core_web_sm或其他东西,并且难以重构,则可能是您想要的。

There are two ways to do this.

The easier one is to mount a volume in Docker with the model directory and specify it as a path. spaCy lets you call spacy.load("some/path"), so no pip install is required.

If you really need to use pip to install something, you can also download the zipped models and pip install that file. However by default that might involve making a copy of it, reducing benefits. If you unzip the model download and mount that you can use pipe -e (editable), which is usually used for develpoment. I wouldn't recommend this, but if you are using import en_core_web_sm or something and have difficulty refactoring it might be what you want.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文