setuptools：包数据文件夹位置

发布于 2024-10-09 06:54:08 字数 567 浏览 6 评论 0原文

我使用 setuptools 来分发我的 python 包。现在我需要分发额外的数据文件。

根据我从 setuptools 文档中收集的信息，我需要将数据文件放在包目录中。但是，我宁愿将数据文件放在根目录的子目录中。

我想避免的：

/ #root
|- src/
|  |- mypackage/
|  |  |- data/
|  |  |  |- resource1
|  |  |  |- [...]
|  |  |- __init__.py
|  |  |- [...]
|- setup.py

我想要的：

/ #root
|- data/
|  |- resource1
|  |- [...]
|- src/
|  |- mypackage/
|  |  |- __init__.py
|  |  |- [...]
|- setup.py

如果不是必需的，我只是对拥有这么多子目录感到不舒服。我找不到原因，为什么我 /have/ 将文件放在包目录中。恕我直言，使用如此多的嵌套子目录也很麻烦。或者有什么充分的理由可以证明这种限制是合理的吗？

原文

I use setuptools to distribute my python package. Now I need to distribute additional datafiles.

From what I've gathered fromt the setuptools documentation, I need to have my data files inside the package directory. However, I would rather have my datafiles inside a subdirectory in the root directory.

What I would like to avoid:

/ #root
|- src/
|  |- mypackage/
|  |  |- data/
|  |  |  |- resource1
|  |  |  |- [...]
|  |  |- __init__.py
|  |  |- [...]
|- setup.py

What I would like to have instead:

/ #root
|- data/
|  |- resource1
|  |- [...]
|- src/
|  |- mypackage/
|  |  |- __init__.py
|  |  |- [...]
|- setup.py

I just don't feel comfortable with having so many subdirectories, if it's not essential. I fail to find a reason, why I /have/ to put the files inside the package directory. It is also cumbersome to work with so many nested subdirectories IMHO. Or is there any good reason that would justify this restriction?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

往日 2024-10-16 06:54:08

选项 1：作为包数据安装

将数据文件放置在 Python 包的根目录中的主要优点
是它可以让您不必担心文件将存放在用户的何处
系统，可能是 Windows、Mac、Linux、某些移动平台或 Egg 内。你可以
始终找到相对于 Python 包根目录的 data 目录，无论它安装在何处或如何安装。

例如，如果我有一个像这样的项目布局：

project/
    foo/
        __init__.py
        data/
            resource1/
                foo.txt

您可以向 __init__.py 添加一个函数来定位数据的绝对路径
file:

import os

_ROOT = os.path.abspath(os.path.dirname(__file__))
def get_data(path):
    return os.path.join(_ROOT, 'data', path)

print get_data('resource1/foo.txt')

输出：

/Users/pat/project/foo/data/resource1/foo.txt

项目作为 Egg 安装后，data 的路径将会更改，但代码不需要更改：

/Users/pat/virtenv/foo/lib/python2.6/site-packages/foo-0.0.0-py2.6.egg/foo/data/resource1/foo.txt

选项 2：安装到固定位置

另一种方法是将数据放在 Python 包之外，然后
要么：

通过配置文件传入 data 的位置，
命令行参数或
将位置嵌入到您的 Python 代码中。

如果您打算分发项目，那么这是不太理想的。如果您确实想要这样做，您可以通过传入元组列表来指定每组文件的目标，从而将数据安装在目标系统上的任何位置：

from setuptools import setup
setup(
    ...
    data_files=[
        ('/var/data1', ['data/foo.txt']),
        ('/var/data2', ['data/bar.txt'])
        ]
    )

更新：递归 grep Python 文件的 shell 函数示例：

atlas% function grep_py { find . -name '*.py' -exec grep -Hn $* {} \; }
atlas% grep_py ": \["
./setup.py:9:    package_data={'foo': ['data/resource1/foo.txt']}

Option 1: Install as package data

The main advantage of placing data files inside the root of your Python package
is that it lets you avoid worrying about where the files will live on a user's
system, which may be Windows, Mac, Linux, some mobile platform, or inside an Egg. You can
always find the directory data relative to your Python package root, no matter where or how it is installed.

For example, if I have a project layout like so:

project/
    foo/
        __init__.py
        data/
            resource1/
                foo.txt

You can add a function to __init__.py to locate an absolute path to a data
file:

import os

_ROOT = os.path.abspath(os.path.dirname(__file__))
def get_data(path):
    return os.path.join(_ROOT, 'data', path)

print get_data('resource1/foo.txt')

Outputs:

/Users/pat/project/foo/data/resource1/foo.txt

After the project is installed as an Egg the path to data will change, but the code doesn't need to change:

/Users/pat/virtenv/foo/lib/python2.6/site-packages/foo-0.0.0-py2.6.egg/foo/data/resource1/foo.txt

Option 2: Install to fixed location

The alternative would be to place your data outside the Python package and then
either:

Have the location of data passed in via a configuration file,
command line arguments or
Embed the location into your Python code.

This is far less desirable if you plan to distribute your project. If you really want to do this, you can install your data wherever you like on the target system by specifying the destination for each group of files by passing in a list of tuples:

from setuptools import setup
setup(
    ...
    data_files=[
        ('/var/data1', ['data/foo.txt']),
        ('/var/data2', ['data/bar.txt'])
        ]
    )

Updated: Example of a shell function to recursively grep Python files:

atlas% function grep_py { find . -name '*.py' -exec grep -Hn $* {} \; }
atlas% grep_py ": \["
./setup.py:9:    package_data={'foo': ['data/resource1/foo.txt']}

回复收藏 0 原文

樱＆纷飞 2024-10-16 06:54:08

我认为我找到了一个很好的折衷方案，它允许您维护以下结构：

/ #root
|- data/
|  |- resource1
|  |- [...]
|- src/
|  |- mypackage/
|  |  |- __init__.py
|  |  |- [...]
|- setup.py

您应该将数据安装为 package_data，以避免 Samplebias 答案中描述的问题，但为了维护文件结构，您应该添加到 setup.py 中：

try:
    os.symlink('../../data', 'src/mypackage/data')
    setup(
        ...
        package_data = {'mypackage': ['data/*']}
        ...
    )
finally:
    os.unlink('src/mypackage/data')

通过这种方式，我们“及时”创建适当的结构，并保持源代码树的组织。

要在代码中访问此类数据文件，您“只需”使用：

data = resource_filename(Requirement.parse("main_package"), 'mypackage/data')

我仍然不喜欢必须指定代码中的“mypackage”，因为数据可能与此模块无关，但我想这是一个很好的妥协。

I Think I found a good compromise which will allow you to mantain the following structure:

/ #root
|- data/
|  |- resource1
|  |- [...]
|- src/
|  |- mypackage/
|  |  |- __init__.py
|  |  |- [...]
|- setup.py

You should install data as package_data, to avoid the problems described in samplebias answer, but in order to mantain the file structure you should add to your setup.py:

try:
    os.symlink('../../data', 'src/mypackage/data')
    setup(
        ...
        package_data = {'mypackage': ['data/*']}
        ...
    )
finally:
    os.unlink('src/mypackage/data')

This way we create the appropriate structure "just in time", and mantain our source tree organized.

To access such data files within your code, you 'simply' use:

data = resource_filename(Requirement.parse("main_package"), 'mypackage/data')

I still don't like having to specify 'mypackage' in the code, as the data could have nothing to do necessarally with this module, but i guess its a good compromise.

回复收藏 0 原文