在 Python 脚本中存储数据块的 Pythonic 方式是什么?

发布于 2024-11-28 01:42:37 字数 89 浏览 5 评论 0原文

Perl 允许我在脚本中使用 __DATA__ 标记来标记数据块的开始。我可以使用 DATA 文件句柄读取数据。在脚本中存储数据块的 Pythonic 方式是什么?

Perl allows me to use the __DATA__ token in a script to mark the start of a data block. I can read the data using the DATA filehandle. What's the Pythonic way to store a data block in a script?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

若言繁花未落 2024-12-05 01:42:37

这取决于您的数据,但是字典文字和多行字符串都是非常好的方法。

state_abbr = {
    'MA': 'Massachusetts',
    'MI': 'Michigan',
    'MS': 'Mississippi',
    'MN': 'Minnesota',
    'MO': 'Missouri',
    }

gettysburg = """
Four score and seven years ago,
our fathers brought forth on this continent
a new nation, 
conceived in liberty
and dedicated to the proposition
that all men are created equal.
"""

It depends on your data, but dict literals and multi-line strings are both really good ways.

state_abbr = {
    'MA': 'Massachusetts',
    'MI': 'Michigan',
    'MS': 'Mississippi',
    'MN': 'Minnesota',
    'MO': 'Missouri',
    }

gettysburg = """
Four score and seven years ago,
our fathers brought forth on this continent
a new nation, 
conceived in liberty
and dedicated to the proposition
that all men are created equal.
"""
只为守护你 2024-12-05 01:42:37

使用 StringIO 模块创建一个类似源文件的对象:

from StringIO import StringIO

textdata = """\
Now is the winter of our discontent,
Made glorious summer by this sun of York.
"""

# in place of __DATA__ = open('richard3.txt')
__DATA__ = StringIO(textdata)
for d in __DATA__:
    print d

__DATA__.seek(0)
print __DATA__.readline()

打印:(

Now is the winter of our discontent,

Made glorious summer by this sun of York.

Now is the winter of our discontent,

我只是将此称为 __DATA__ 来与您原来的问题保持一致。实际上,这不是一个好的 Python 命名风格 - 某些东西像 datafile 会更合适。)

Use the StringIO module to create an in-source file-like object:

from StringIO import StringIO

textdata = """\
Now is the winter of our discontent,
Made glorious summer by this sun of York.
"""

# in place of __DATA__ = open('richard3.txt')
__DATA__ = StringIO(textdata)
for d in __DATA__:
    print d

__DATA__.seek(0)
print __DATA__.readline()

Prints:

Now is the winter of our discontent,

Made glorious summer by this sun of York.

Now is the winter of our discontent,

(I just called this __DATA__ to align with your original question. In practice, this would not be good Python naming style - something like datafile would be more appropriate.)

酷遇一生 2024-12-05 01:42:37

在我看来,它很大程度上取决于数据的类型:如果您只有文本并且可以确定其中不存在任何可能存在的“”或“””,则可以使用此版本的存储文本。但是什么例如,如果您想要在已知存在或可能存在“”或“””的地方存储一些文本,该怎么办?那么建议

  • 要么存储以任何方式编码的数据,要么
  • 将其放入单独的文件中

示例:文本是

Python 库中有很多“'”和“”。

在这种情况下,通过三引号可能很难做到。所以你可以这样做,

__DATA__ = """There are many '''s and \"""s in Python libraries.""";
print __DATA__

但是在编辑或替换文本时你必须注意。
更有用

$ python -c 'import sys; print sys.stdin.read().encode("base64")'
There are many '''s and """s in Python libraries.<press Ctrl-D twice>

在这种情况下,这样做可能比获得

VGhlcmUgYXJlIG1hbnkgJycncyBhbmQgIiIicyBpbiBQeXRob24gbGlicmFyaWVzLg==

输出 。将其放入您的脚本中,例如并

__DATA__ = 'VGhlcmUgYXJlIG1hbnkgJycncyBhbmQgIiIicyBpbiBQeXRob24gbGlicmFyaWVzLg=='.decode('base64')
print __DATA__

查看结果。

IMO it highly depends on the type of data: if you have only text and can be sure that there is not ''' or """ which micht by any chance be inside, you can use this version of storing the text. But what to do if you want, for example, store some text where it is known that ''' or """ is there or might be there? Then it is adviseable to

  • either store the data coded in any way or
  • put it in a separate file

Example: The text is

There are many '''s and """s in Python libraries.

In this case, it might be hard to do it via triple quote. So you can do

__DATA__ = """There are many '''s and \"""s in Python libraries.""";
print __DATA__

But there you have to pay attention when editing or replacing the text.
In this case, it might be more useful to do

$ python -c 'import sys; print sys.stdin.read().encode("base64")'
There are many '''s and """s in Python libraries.<press Ctrl-D twice>

then you get

VGhlcmUgYXJlIG1hbnkgJycncyBhbmQgIiIicyBpbiBQeXRob24gbGlicmFyaWVzLg==

as output. Take this and put it into your script, such as in

__DATA__ = 'VGhlcmUgYXJlIG1hbnkgJycncyBhbmQgIiIicyBpbiBQeXRob24gbGlicmFyaWVzLg=='.decode('base64')
print __DATA__

and see the result.

鹿! 2024-12-05 01:42:37

由于不熟悉 Perl 的 __DATA__ 变量,Google 告诉我它经常用于测试。假设您也在考虑测试您的代码,您可能需要考虑 doctest (http://docs.python.org/library/doctest.html)。例如,不要

import StringIO

__DATA__ = StringIO.StringIO("""lines
of data
from a file
""")

假设您希望 DATA 成为您现在拥有的文件对象,并且您可以像将来的大多数其他文件对象一样使用它。例如:

if __name__=="__main__":
    # test myfunc with test data:
    lines = __DATA__.readlines()
    myfunc(lines)

但是,如果 DATA 的唯一用途是用于测试,那么您最好在 PyUnit / Nose 中创建文档测试或编写测试用例。

例如:

import StringIO

def myfunc(lines):
    r"""Do something to each line

    Here's an example:

    >>> data = StringIO.StringIO("line 1\nline 2\n")
    >>> myfunc(data)
    ['1', '2']
    """
    return [line[-2] for line in lines]

if __name__ == "__main__":
    import doctest
    doctest.testmod()

像这样运行这些测试:

$ python ~/doctest_example.py -v
Trying:
    data = StringIO.StringIO("line 1\nline 2\n")
Expecting nothing
ok
Trying:
    myfunc(data)
Expecting:
    ['1', '2']
ok
1 items had no tests:
    __main__
1 items passed all tests:
   2 tests in __main__.myfunc
2 tests in 2 items.
2 passed and 0 failed.
Test passed.

Doctest 做了很多不同的事情,包括在纯文本文件中查找 python 测试并运行它们。就我个人而言,我不是一个忠实的粉丝,更喜欢更结构化的测试方法(import unittest),但它无疑是一种测试代码的 Python 方法。

Not being familiar with Perl's __DATA__ variable Google is telling me that it's often used for testing. Assuming you are also looking into testing your code you may want to consider doctest (http://docs.python.org/library/doctest.html). For example, instead of

import StringIO

__DATA__ = StringIO.StringIO("""lines
of data
from a file
""")

Assuming you wanted DATA to be a file object that's now what you've got and you can use it like most other file objects going forward. For example:

if __name__=="__main__":
    # test myfunc with test data:
    lines = __DATA__.readlines()
    myfunc(lines)

But if the only use of DATA is for testing you are probably better off creating a doctest or writing a test case in PyUnit / Nose.

For example:

import StringIO

def myfunc(lines):
    r"""Do something to each line

    Here's an example:

    >>> data = StringIO.StringIO("line 1\nline 2\n")
    >>> myfunc(data)
    ['1', '2']
    """
    return [line[-2] for line in lines]

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Running those tests like this:

$ python ~/doctest_example.py -v
Trying:
    data = StringIO.StringIO("line 1\nline 2\n")
Expecting nothing
ok
Trying:
    myfunc(data)
Expecting:
    ['1', '2']
ok
1 items had no tests:
    __main__
1 items passed all tests:
   2 tests in __main__.myfunc
2 tests in 2 items.
2 passed and 0 failed.
Test passed.

Doctest does a lot of different things including finding python tests in plain text files and running them. Personally, I'm not a big fan and prefer more structured testing approaches (import unittest) but it is unequivocally a pythonic way to test ones code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文