Azure文物供稿的Python包装的最佳实践
我已经开发了一些Python软件包,这些软件包已经在Azure DevOps上上传了DevOps Pipeline。 它运行良好,但是在工件上存储的管道不仅是我的软件包,而且甚至它们对setup.cfg文件的依赖性!
它们是正常的依赖性,大熊猫和类似的依赖性,但是将这些图书馆的副本存储在人工制品上是最好的做法吗?对于我的逻辑,我会说不... 如何防止这种行为?
这些是我的管道和我的CFG文件:
管道
trigger:
tags:
include:
- 'v*.*'
branches:
include:
- main
- dev-release
pool:
vmImage: 'ubuntu-latest'
stages:
- stage: 'Stage_Test'
variables:
- group: UtilsDev
jobs:
- job: 'Job_Test'
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
displayName: 'Use Python $(pythonVersion)'
- script: |
python -m pip install --upgrade pip
displayName: 'Upgrade PIP'
- script: |
pip install pytest pytest-azurepipelines
displayName: 'Install test dependencies'
- script: |
pytest
displayName: 'Execution of PyTest'
- stage: 'Stage_Build'
variables:
- group: UtilsDev
jobs:
- job: 'Job_Build'
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
displayName: 'Use Python $(pythonVersion)'
- script: |
python -m pip install --upgrade pip
displayName: 'Upgrade PIP'
- script: |
pip install build wheel
displayName: 'Install build dependencies'
- script: |
python -m build
displayName: 'Artifact creation'
- publish: '$(System.DefaultWorkingDirectory)'
artifact: package
- stage: 'Stage_Deploy_DEV'
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/dev-release'))
variables:
- group: UtilsDev
jobs:
- deployment: Build_Deploy
displayName: Build Deploy
environment: [OMIT]-artifacts-dev
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: package
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
displayName: 'Use Python $(pythonVersion)'
- script: |
pip install twine
displayName: 'Install build dependencies'
- task: TwineAuthenticate@1
displayName: 'Twine authentication'
inputs:
pythonUploadServiceConnection: 'PythonPackageUploadDEV'
- script: |
python -m twine upload --skip-existing --verbose -r $(feedName) --config-file $(PYPIRC_PATH) dist/*
workingDirectory: '$(Pipeline.Workspace)/package'
displayName: 'Artifact upload'
- stage: 'Stage_Deploy_PROD'
dependsOn: 'Stage_Build'
condition: and(succeeded(), or(eq(variables['Build.SourceBranch'], 'refs/heads/main'), startsWith(variables['Build.SourceBranch'], 'refs/tags/v')))
variables:
- group: UtilsProd
jobs:
- job: 'Approval_PROD_Release'
pool: server
steps:
- task: ManualValidation@0
timeoutInMinutes: 1440 # task times out in 1 day
inputs:
notifyUsers: |
[USER]@[OMIT].com
instructions: 'Please validate the build configuration and resume'
onTimeout: 'resume'
- deployment: Build_Deploy
displayName: Build Deploy
environment: [OMIT]-artifacts-prod
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: package
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
displayName: 'Use Python $(pythonVersion)'
- script: |
pip install twine
displayName: 'Install build dependencies'
- task: TwineAuthenticate@1
displayName: 'Twine authentication'
inputs:
pythonUploadServiceConnection: 'PythonPackageUploadPROD'
- script: |
python -m twine upload --skip-existing --verbose -r $(feedName) --config-file $(PYPIRC_PATH) dist/*
workingDirectory: '$(Pipeline.Workspace)/package'
displayName: 'Artifact upload'
设置文件
[metadata]
name = [OMIT]_azure
version = 0.2
author = [USER]
author_email = [USER]@[OMIT].com
description = A package containing utilities for interacting with Azure
long_description = file: README.md
long_description_content_type = text/markdown
project_urls =
classifiers =
Programming Language :: Python :: 3
License :: OSI Approved :: MIT License
Operating System :: OS Independent
[options]
package_dir =
= src
packages = find:
python_requires = >=3.7
install_requires =
azure-storage-file-datalake>="12.6.0"
pyspark>="3.2.1"
openpyxl>="3.0.9"
pandas>="1.4.2"
pyarrow>="8.0.0"
fsspec>="2022.3.0"
adlfs>="2022.4.0"
[OMIT]-utils>="0.4"
[options.packages.find]
where = src
我注意到管道仅在生产阶段具有这种行为(stage_deploy_prod),而不是在dev-中发布一个(stage_deploy_dev),并且存储的依赖项远远超过setup.cfg文件中指定的8个。
有人处理过吗?
提前致谢!!
I've developed some Python packages that I've uploaded on Azure DevOps Artifacts with a DevOps pipeline.
It works well, but the pipeline stores on Artifacts not only my packages, but even their dependencies on the setup.cfg file!
They are normal dependencies, pandas and similar, but is it a best practice to store a copy of these libraries on Artifacts? For my logic I would say no...
How can I prevent this behaviour?
These are my pipeline and my cfg file:
pipeline
trigger:
tags:
include:
- 'v*.*'
branches:
include:
- main
- dev-release
pool:
vmImage: 'ubuntu-latest'
stages:
- stage: 'Stage_Test'
variables:
- group: UtilsDev
jobs:
- job: 'Job_Test'
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
displayName: 'Use Python $(pythonVersion)'
- script: |
python -m pip install --upgrade pip
displayName: 'Upgrade PIP'
- script: |
pip install pytest pytest-azurepipelines
displayName: 'Install test dependencies'
- script: |
pytest
displayName: 'Execution of PyTest'
- stage: 'Stage_Build'
variables:
- group: UtilsDev
jobs:
- job: 'Job_Build'
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
displayName: 'Use Python $(pythonVersion)'
- script: |
python -m pip install --upgrade pip
displayName: 'Upgrade PIP'
- script: |
pip install build wheel
displayName: 'Install build dependencies'
- script: |
python -m build
displayName: 'Artifact creation'
- publish: '$(System.DefaultWorkingDirectory)'
artifact: package
- stage: 'Stage_Deploy_DEV'
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/dev-release'))
variables:
- group: UtilsDev
jobs:
- deployment: Build_Deploy
displayName: Build Deploy
environment: [OMIT]-artifacts-dev
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: package
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
displayName: 'Use Python $(pythonVersion)'
- script: |
pip install twine
displayName: 'Install build dependencies'
- task: TwineAuthenticate@1
displayName: 'Twine authentication'
inputs:
pythonUploadServiceConnection: 'PythonPackageUploadDEV'
- script: |
python -m twine upload --skip-existing --verbose -r $(feedName) --config-file $(PYPIRC_PATH) dist/*
workingDirectory: '$(Pipeline.Workspace)/package'
displayName: 'Artifact upload'
- stage: 'Stage_Deploy_PROD'
dependsOn: 'Stage_Build'
condition: and(succeeded(), or(eq(variables['Build.SourceBranch'], 'refs/heads/main'), startsWith(variables['Build.SourceBranch'], 'refs/tags/v')))
variables:
- group: UtilsProd
jobs:
- job: 'Approval_PROD_Release'
pool: server
steps:
- task: ManualValidation@0
timeoutInMinutes: 1440 # task times out in 1 day
inputs:
notifyUsers: |
[USER]@[OMIT].com
instructions: 'Please validate the build configuration and resume'
onTimeout: 'resume'
- deployment: Build_Deploy
displayName: Build Deploy
environment: [OMIT]-artifacts-prod
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: package
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
displayName: 'Use Python $(pythonVersion)'
- script: |
pip install twine
displayName: 'Install build dependencies'
- task: TwineAuthenticate@1
displayName: 'Twine authentication'
inputs:
pythonUploadServiceConnection: 'PythonPackageUploadPROD'
- script: |
python -m twine upload --skip-existing --verbose -r $(feedName) --config-file $(PYPIRC_PATH) dist/*
workingDirectory: '$(Pipeline.Workspace)/package'
displayName: 'Artifact upload'
setup file
[metadata]
name = [OMIT]_azure
version = 0.2
author = [USER]
author_email = [USER]@[OMIT].com
description = A package containing utilities for interacting with Azure
long_description = file: README.md
long_description_content_type = text/markdown
project_urls =
classifiers =
Programming Language :: Python :: 3
License :: OSI Approved :: MIT License
Operating System :: OS Independent
[options]
package_dir =
= src
packages = find:
python_requires = >=3.7
install_requires =
azure-storage-file-datalake>="12.6.0"
pyspark>="3.2.1"
openpyxl>="3.0.9"
pandas>="1.4.2"
pyarrow>="8.0.0"
fsspec>="2022.3.0"
adlfs>="2022.4.0"
[OMIT]-utils>="0.4"
[options.packages.find]
where = src
I've noticed that the pipeline has this behavior only in the production stage (Stage_Deploy_PROD) and not in the dev-release one (Stage_Deploy_DEV) and that the stored dependencies are much more than the 8 specified in the setup.cfg file.
Has anyone ever dealt with this?
Thanks in advance!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据这一点noreferrer”> doc ,一旦启用了上游源,每次您从公共注册表中安装软件包时,Azure伪像,Azure trifacts将在您的提要中保存该软件包的副本。
工件中包装中包含更多软件包的原因之一是cfg文件中的包装更多的原因是,当您下载某些软件包时,这些软件包的必要依赖项也将一起下载。取 pyspark 当您下载pyspark时, ,由于需要PY4J,因此也将一起下载。

这是我的测试结果,当我仅在管道中下载Pyspark时,还下载了PY4J,并且将副本保存到Artifact中。

According to this doc, once you've enabled an upstream source, every time you install a package from the public registry, Azure Artifacts will save a copy of that package in your feed.
One of the reasons why there are more packages in Artifact than in your setup.cfg file is that when you download some packages, the necessary dependencies of these packages will also be downloaded together. Take PySpark as an example, when you download PySpark, since Py4J is required, it will also be downloaded together.

This is my test result, when I only download PySpark in the pipeline, Py4J is also downloaded and a copy is saved to Artifact.
