如何正确地将数据文件夹包含到python包中
我正在构建一个小型 python 包,将其部署到我们的内部 pypi 服务器,以便可以使用 pip
轻松安装。我正在使用 setup.py 构建 tar.gz 存档以上传到那里。 我需要包含一些额外的数据 - 更具体地说,我在项目中使用 nltk,并且我想发送包含已下载的特定 nltk 数据的包,因为它让使用我的包的人负责自己下载它对我来说没有意义。所以我有以下结构
├── setup.py
├── src
│ ├── __init__.py
│ ├── my_pkg
│ │ ├── __init__.py
│ │ ├── my_module.py
│ │ └── resources
│ │ └── nltk_data
| | └─... too many subfolders and files
,我想在安装包后将整个 nltk_data 子文件夹包含在同一位置。我设法为一个文件工作 package_data={'my_pkg' :['./resources/file.dat']},
,但我不知道如何对复杂目录执行相同操作具有许多子文件夹、子子文件夹、不同扩展名的文件等的结构。有什么方法可以做到这一点?
我的 setup.py 非常简单(为了简单起见,我省略了描述或 URL 等内容)
from setuptools import setup, find_packages
setup(
name='some-cool-name',
version="1.0.0",
classifiers=[],
packages=find_packages(where='src'),
package_dir={'': 'src'},
package_data={'my_pkg' :[]},
include_package_data=True,
py_modules=[],
python_requires='>=3.8',
install_requires=['nltk==3.6.5']
)
I'm building a small python package that I deploy to our internal pypi server to be easily installable with pip
. I'm using setup.py
to build the tar.gz archive to upload there.
And I need to include some additional data - to be more specific, I use nltk
in my project and I want to ship the package with specific nltk
data already downloaded, since it doesn't make sense to me to make the person using my package responsible for downloading it themself. So I have the following structure
├── setup.py
├── src
│ ├── __init__.py
│ ├── my_pkg
│ │ ├── __init__.py
│ │ ├── my_module.py
│ │ └── resources
│ │ └── nltk_data
| | └─... too many subfolders and files
And I would like to include the whole nltk_data
subfolder to be in the same place once the package is installed. I managed to get working package_data={'my_pkg' :['./resources/file.dat']},
for one file, but I don't know how to do the same with complex directory structure with many subfolder, subsubfolders, file of different extensions etc. Is there any way to do this?
My setup.py is quite simple (I omitted things such as description or URL for simplicity)
from setuptools import setup, find_packages
setup(
name='some-cool-name',
version="1.0.0",
classifiers=[],
packages=find_packages(where='src'),
package_dir={'': 'src'},
package_data={'my_pkg' :[]},
include_package_data=True,
py_modules=[],
python_requires='>=3.8',
install_requires=['nltk==3.6.5']
)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以简单地指定要包含的数据的相对路径。您需要在两个子文件夹中放置
__ init __. py
- file,但随后它应该起作用。要在脚本中使用数据,请使用
oxportlib
(例如ementlib.read_text
)打开所需的文件。You can simply specify the relative path to the data you want to include. You need to put an
__init__.py
-file in both subfolders though, but then it should work.To use the data in your script, use
importlib
(for exampleimportlib.read_text
) to open your desired file.在我发布这个问题后不久,我遇到了另一个解决方案,这在这里没有提到,而且看起来很优雅,所以如果有人觉得它有用,我会在这里放置它。
文件
subtest.in
在目录结构的最高级别(setup.py)可以轻松地使用相同的操作A while after I posted this question, I encountered another solution, that was not mentioned here and seemed quite elegant, so I will put it here in case someone finds it useful.
File
MANIFEST.in
in the top level of the directory structure, next to setup.py, can do the same easily, with