我能让 conda 更快地解决这个环境吗?

发布于 2025-01-12 09:26:44 字数 878 浏览 0 评论 0原文

我在通过 GitLab CI 运行的模块中使用 geopandas...并且环境解决步骤需要很长时间。比如,大约需要 30 分钟来解决问题,而需要 2 分钟来运行该作业。

在每个 CI 作业中,

  1. 都会启动一个带有临时映像的容器,
  2. 所需的依赖项
  3. 创建一个 conda 环境,其中包含安装包

并运行一个脚本。当然,我可以为此作业创建一个特定的映像并执行以下操作:仅解决一次的负担,但这意味着依赖关系将被冻结......这不是预期的行为。

正如 geopandas 文档中建议的那样,我使用 conda-forge 通道。

这是环境文件:

name: my_package
channels:
  - conda-forge
dependencies:
  - conda-forge::python
  - conda-forge::numpy
  - conda-forge::pandas
  - conda-forge::geopandas
  - conda-forge::geopy
  - conda-forge::pyarrow
  - conda-forge::scikit-learn
  - conda-forge::matplotlib 
  - conda-forge::coverage
  - conda-forge::shapely
  - conda-forge::intake
  - conda-forge::pytest
  - conda-forge::sphinx
  - conda-forge::pysmb
  - conda-forge::xlrd
  - conda-forge::openpyxl
  - conda-forge::sphinx_rtd_theme

关于如何加速环境解决的任何想法?

I am using geopandas in a module that is run through GitLab CI... and the environment solving step takes forever. Like, around 30 minutes of solving for 2 minutes of running the job.

At each CI job

  1. a container with the ad hoc image is started
  2. a conda environment is created with the dependencies needed for the package
  3. the package is installed and a script is run

Of course, I could create a specific image for this job and go through the burden of solving only once but this means dependencies would be frozen... and this is not the expected behavior.

As is recommended in geopandas documentation, I use the conda-forge channel.

Here is the environment file:

name: my_package
channels:
  - conda-forge
dependencies:
  - conda-forge::python
  - conda-forge::numpy
  - conda-forge::pandas
  - conda-forge::geopandas
  - conda-forge::geopy
  - conda-forge::pyarrow
  - conda-forge::scikit-learn
  - conda-forge::matplotlib 
  - conda-forge::coverage
  - conda-forge::shapely
  - conda-forge::intake
  - conda-forge::pytest
  - conda-forge::sphinx
  - conda-forge::pysmb
  - conda-forge::xlrd
  - conda-forge::openpyxl
  - conda-forge::sphinx_rtd_theme

Any idea on how to speed up environment solving?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

萌逼全场 2025-01-19 09:26:44

我同意 @Olsgaard 的建议,值得考虑重新设计 CI 工作流程,将图像生成与测试阶段分离。然而,从技术上来说,这并不像所质疑的那样“加快环境解决速度”。

为了更快地解决问题:

  1. 使用Mamba

  2. 至少固定python版本,例如python=3.9。还可以考虑添加 DAG“集线器”的最低版本,例如 numpy、pandas 等。这将大大减少解决方案空间。

I agree with @Olsgaard's suggestion, that it's worth considering a redesign of the CI workflow to decouple the image generation from the testing phase. However, that doesn't technically "speed up environment solving" as was queried.

For faster solves:

  1. Use Mamba, as @FlyingTeller mentioned. This provides fast solving by using a compiled SAT solver rather than Python.

  2. At least pin the python version, e.g., python=3.9. Consider also adding minimum versions for DAG "hubs" like numpy, pandas, etc.. This would vastly reduce the solution space.

深者入戏 2025-01-19 09:26:44

有几种方法可以解决这个问题。您可以做的是让 CI 管道运行 3 个步骤

  • 步骤 a:加载自定义映像并安装依赖项
  • 步骤 b:使用新依赖项创建新映像
  • 步骤 c:运行测试

只要步骤 bc 并行运行,映像创建不会妨碍您的测试,并且由于您总是更新环境,因此步骤 a 运行速度会快得多。您可以在步骤 b 中添加逻辑,以确保它仅在需要时构建新映像。

There are a few paths to solve this. What you could do is have the CI pipeline run 3 steps

  • step a: load custom image and install dependencies
  • step b: create a new image with the new dependencies
  • step c: run your tests

As long as step b and c run in parallel, the image creation won't hinder your tests, and since you are always updating your environment, step a will run much faster. You can add logic in step b, to make sure it only builds a new image when needed.

韶华倾负 2025-01-19 09:26:44

使用 mambaforge,从 .sh 文件安装并使用 mamba 安装 conda 软件包。应该将您的设置时间减少到几分钟或更短。

https://github.com/conda-forge/miniforge

比如:

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh

然后我想喜欢:

source ~/.bashrc
mamba env create -f env_config.yml

Use mambaforge, install from the .sh file and use mamba to install conda packages. Should reduce your setup time to minutes or less.

https://github.com/conda-forge/miniforge

Something like:

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh

Then I think something like:

source ~/.bashrc
mamba env create -f env_config.yml
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文