在 Python 中编写固定宽度、空格分隔的 CSV 输出

发布于 2024-10-31 16:36:21 字数 1130 浏览 12 评论 0原文

我想使用 Python 的 csv writer 编写一个固定宽度、空格分隔和最少引用的 CSV 文件。输出示例：

item1           item2  
"next item1"    "next item2"
anotheritem1    anotheritem2

如果我使用

writer.writerow( ("{0:15s}".format(item1), "{0:15s}".format(item2)) )
...

然后，使用空格分隔符，格式会被破坏，因为由于项目格式的尾随空格而添加了引号或转义符（取决于 csv.QUOTE_* 常量）：

"item1          " "item2          "
"next item1     " "next item2     "
"anotheritem1   " "anotheritem2   "

当然，我可以自己格式化所有内容：

writer.writerow(("{0:15s}{1:15s}".format(item1, item2)) )

但使用 csv writer 没有多大意义。另外，当空格嵌入到项目中并且应该使用引用/转义时，我必须手动整理这些情况。换句话说，我似乎需要一个（不存在的）“QUOTE_ABSOLUTELYMINIMAL”csv 常量，它将充当“QUOTE_MINIMAL”常量，但也会忽略尾随空格。

有没有办法实现“QUOTE_ABSOLUTELYMINIMAL”行为，或者使用Python的CSV模块获得固定宽度、空格分隔的CSV输出？

我希望 CSV 文件具有固定宽度功能的原因是为了更好的可读性。因此，它将被处理为 CSV 以便读取和写入，但由于列结构而具有更好的可读性。读取不是问题，因为 csvskipinitialspace 选项会忽略额外的空格。令我惊讶的是，写作似乎是一个问题......

编辑：我的结论是使用当前的 csv 插件不可能实现。它不是一个内置选项，我看不到如何手动实现它的任何合理方法，因为似乎没有办法通过Python的csv编写器编写额外的分隔符而不引用或转义它们。因此，我可能必须编写自己的 csv writer。

原文

I would like to write a fixed width, space delimited and minimally quoted CSV file using Python's csv writer.
An example of the output:

item1           item2  
"next item1"    "next item2"
anotheritem1    anotheritem2

If I use

writer.writerow( ("{0:15s}".format(item1), "{0:15s}".format(item2)) )
...

then, with the space delimiter, the formatting is broken as either quotes or escapes (depending on the csv.QUOTE_* constant) are added due to the trailing spaces of the items formatting:

"item1          " "item2          "
"next item1     " "next item2     "
"anotheritem1   " "anotheritem2   "

Of course, I could format everything myself:

writer.writerow( ("{0:15s}{1:15s}".format(item1, item2)) )

but then there is not much point in using the csv writer. Also, I would have to sort out manually those cases when the space is embedded in the items and quoting/escaping should be used. In other words, it seems I would need a (non-existing) "QUOTE_ABSOLUTELYMINIMAL" csv constant that would act as the "QUOTE_MINIMAL" one but would also ignore trailing spaces.

Is there a way to achieve the "QUOTE_ABSOLUTELYMINIMAL" behaviour or another way to get a fixed width, space delimited CSV output using Python's CSV module?

The reason why I want the fixed-width feature in a CSV file is a better readability. So it will be processed as CSV for both reading and writing but better readable due to the column structure. Reading is not a problem as the csv skipinitialspace option takes care of ignoring the extra spaces. To my surprise, writing seems to be a problem...

EDIT: I conclude it is impossible to achieve with the current csv plugin. It is not a built-in option and I cannot see any reasonable way how to achieve it manually as it seems there is no way to write extra delimiters by the Python's csv writer without quoting or escaping them. Thus, I will probably have to write my own csv writer.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

天邊彩虹 2024-11-07 16:36:21

您遇到的基本问题是 csv 和固定格式基本上是相反的数据存储视图。让它们一起工作并不是常见的做法。另外，如果您只对带有空格的项目进行引号，则会导致这些行失去对齐：

testing     "rather hmm "
strange     "ways to    "
"store some " "csv data   "
testing     testing

读回该数据也会导致错误的结果：

'testing' 'rather hmm '
'strange' 'ways to    '
'store some ' 'csv data   '
'testing' 'testing' ''

请注意最后一行末尾的额外字段。考虑到这些问题，我会使用您的示例，

"item1          " "item2          "
"next item1     " "next item2     "
"anotheritem1   " "anotheritem2   "

我发现该示例非常可读，很容易使用现有的 csv 库生成，并且在读回时可以正确解析。这是我用来生成它的代码：

import csv

class SpaceCsv(csv.Dialect):
    "csv format for exporting tables"
    delimiter = None
    doublequote = True
    escapechar = None
    lineterminator = '\n'
    quotechar = '"'
    skipinitialspace = True
    quoting = csv.QUOTE_MINIMAL
csv.register_dialect('space', SpaceCsv)

data = (
        ('testing    ', 'rather hmm '),
        ('strange    ', 'ways to    '),
        ('store some ', 'csv data   '),
        ('testing    ', 'testing    '),

temp = open(r'c:\tmp\fixed.csv', 'w')
writer = csv.writer(temp, dialect='space')
for row in data:
    writer.writerow(row)
temp.close()

当然，您会的，需要将所有数据填充到相同的长度，无论是在执行所有这些操作的函数之前，还是在函数本身中。哦，如果你有数字数据，你还必须为此留出填充余量。

The basic problem you are running into is that csv and fixed-format are basically opposing views of data storage. Making them work together is not a common practice. Also, if you only have quotes on the items with spaces in them, it will throw off the alignment on those rows:

testing     "rather hmm "
strange     "ways to    "
"store some " "csv data   "
testing     testing

Reading that data back in results in wrong results as well:

'testing' 'rather hmm '
'strange' 'ways to    '
'store some ' 'csv data   '
'testing' 'testing' ''

Notice the extra field at the end of the last row. Given these problems, I would go with your example of

"item1          " "item2          "
"next item1     " "next item2     "
"anotheritem1   " "anotheritem2   "

which I find very readable, is easy to generate with the existing csv library, and gets correctly parsed when read back in. Here's the code I used to generate it:

import csv

class SpaceCsv(csv.Dialect):
    "csv format for exporting tables"
    delimiter = None
    doublequote = True
    escapechar = None
    lineterminator = '\n'
    quotechar = '"'
    skipinitialspace = True
    quoting = csv.QUOTE_MINIMAL
csv.register_dialect('space', SpaceCsv)

data = (
        ('testing    ', 'rather hmm '),
        ('strange    ', 'ways to    '),
        ('store some ', 'csv data   '),
        ('testing    ', 'testing    '),

temp = open(r'c:\tmp\fixed.csv', 'w')
writer = csv.writer(temp, dialect='space')
for row in data:
    writer.writerow(row)
temp.close()

You will, of course, need to have all your data padded to the same length, either before getting to the function that does all this, or in the function itself. Oh, and if you have numeric data you'll have to make padding allowances for that as well.

回复收藏 0 原文

此岸叶落 2024-11-07 16:36:21

这对你有什么作用？我认为您确实只是缺少 csv.QUOTE_NONE 常量。

import csv
csv.register_dialect('spacedelimitedfixedwidth', delimiter=' ', quoting=csv.QUOTE_NONE)
with open('crappymainframe.out', 'rb') as f:
    reader = csv.reader(f, 'spacedelimitedfixedwidth')

这是对 csv 模块文档底部的 unixpwd 方言示例的修改。

What does this do for you? I think you really were only missing the csv.QUOTE_NONE constant.

import csv
csv.register_dialect('spacedelimitedfixedwidth', delimiter=' ', quoting=csv.QUOTE_NONE)
with open('crappymainframe.out', 'rb') as f:
    reader = csv.reader(f, 'spacedelimitedfixedwidth')

It's a modification on the unixpwd dialect example at the bottom of the csv module docs.

回复收藏 0 原文