编写/解析具有固定宽度行的文本文件

发布于 2024-07-18 17:28:35 字数 410 浏览 12 评论 0原文

我是 Python 的新手,我正在考虑使用它来编写我们的供应商需要的一些复杂的 EDI 内容。

基本上,他们需要一个 80 个字符的固定宽度文本文件,其中某些字段“块”包含数据,而其他字段留空。 我有文档,所以我知道每个“块”的长度是多少。 我得到的响应更容易解析,因为它已经有了数据,我可以使用 Python 的“切片”来提取我需要的内容,但我无法分配给切片 - 我已经尝试过,因为它听起来不错解决方案,但它不起作用,因为 Python 字符串是不可变的:)

就像我说的,我确实是 Python 的新手,但我很高兴学习它:) 我将如何去做呢? 理想情况下,我希望能够说范围 10-20 等于“Foo”,并让它成为带有 7 个附加空白字符的字符串“Foo”(假设所述字段的长度为 10),并将其设为更大的 80 个字符字段的一部分,但我不确定如何做我的想法。

I'm a newbie to Python and I'm looking at using it to write some hairy EDI stuff that our supplier requires.

Basically they need an 80-character fixed width text file, with certain "chunks" of the field with data and others left blank. I have the documentation so I know what the length of each "chunk" is. The response that I get back is easier to parse since it will already have data and I can use Python's "slices" to extract what I need, but I can't assign to a slice - I tried that already because it sounded like a good solution, and it didn't work since Python strings are immutable :)

Like I said I'm really a newbie to Python but I'm excited about learning it :) How would I go about doing this? Ideally I'd want to be able to say that range 10-20 is equal to "Foo" and have it be the string "Foo" with 7 additional whitespace characters (assuming said field has a length of 10) and have that be a part of the larger 80-character field, but I'm not sure how to do what I'm thinking.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

内心激荡 2024-07-25 17:28:35

您不需要分配给切片,只需使用 % 格式化

具有 3 个数据项的固定格式的示例:

>>> fmt="%4s%10s%10s"
>>> fmt % (1,"ONE",2)
'   1       ONE         2'
>>> 

同样的事情,随数据提供的字段宽度:

>>> fmt2 = "%*s%*s%*s"
>>> fmt2 % (4,1, 10,"ONE", 10,2)
'   1       ONE         2'
>>> 

分离数据和字段宽度,并使用 zip()str.join()技巧:

>>> widths=(4,10,10)
>>> items=(1,"ONE",2)
>>> "".join("%*s" % i for i in zip(widths, items))
'   1       ONE         2'
>>> 

You don't need to assign to slices, just build the string using % formatting.

An example with a fixed format for 3 data items:

>>> fmt="%4s%10s%10s"
>>> fmt % (1,"ONE",2)
'   1       ONE         2'
>>> 

Same thing, field width supplied with the data:

>>> fmt2 = "%*s%*s%*s"
>>> fmt2 % (4,1, 10,"ONE", 10,2)
'   1       ONE         2'
>>> 

Separating data and field widths, and using zip() and str.join() tricks:

>>> widths=(4,10,10)
>>> items=(1,"ONE",2)
>>> "".join("%*s" % i for i in zip(widths, items))
'   1       ONE         2'
>>> 
(り薆情海 2024-07-25 17:28:35

希望我明白您在寻找什么:某种方法可以通过一个简单的变量方便地识别该行的每个部分,但将其输出填充到正确的宽度?

下面的代码片段可能会为您提供您想要的

class FixWidthFieldLine(object):

    fields = (('foo', 10),
              ('bar', 30),
              ('ooga', 30),
              ('booga', 10))

    def __init__(self):
        self.foo = ''
        self.bar = ''
        self.ooga = ''
        self.booga = ''

    def __str__(self):
        return ''.join([getattr(self, field_name).ljust(width) 
                        for field_name, width in self.fields])

f = FixWidthFieldLine()
f.foo = 'hi'
f.bar = 'joe'
f.ooga = 'howya'
f.booga = 'doin?'

print f

结果:

hi        joe                           howya                         doing     

它通过存储一个类级变量 fields 来工作,该变量记录每个字段在输出中出现的顺序以及列数那个字段应该有。 __init__ 中有相应命名的实例变量,它们最初设置为空字符串。

__str__ 方法将这些值输出为字符串。 它对类级 fields 属性使用列表理解,按名称查找每个字段的实例值,然后根据列左对齐其输出。 然后,生成的字段列表通过空字符串连接在一起。

请注意,这不会解析输入,但您可以轻松地重写构造函数以获取字符串并根据 fields 中的字段和字段宽度解析列。 它也不会检查比分配的宽度长的实例值。

Hopefully I understand what you're looking for: some way to conveniently identify each part of the line by a simple variable, but output it padded to the correct width?

The snippet below may give you what you want

class FixWidthFieldLine(object):

    fields = (('foo', 10),
              ('bar', 30),
              ('ooga', 30),
              ('booga', 10))

    def __init__(self):
        self.foo = ''
        self.bar = ''
        self.ooga = ''
        self.booga = ''

    def __str__(self):
        return ''.join([getattr(self, field_name).ljust(width) 
                        for field_name, width in self.fields])

f = FixWidthFieldLine()
f.foo = 'hi'
f.bar = 'joe'
f.ooga = 'howya'
f.booga = 'doin?'

print f

This yields:

hi        joe                           howya                         doing     

It works by storing a class-level variable, fields which records the order in which each field should appear in the output, together with the number of columns that field should have. There are correspondingly-named instance variables in the __init__ that are set to an empty string initially.

The __str__ method outputs these values as a string. It uses a list comprehension over the class-level fields attribute, looking up the instance value for each field by name, and then left-justifying it's output according to the columns. The resulting list of fields is then joined together by an empty string.

Note this doesn't parse input, though you could easily override the constructor to take a string and parse the columns according to the field and field widths in fields. It also doesn't check for instance values that are longer than their allotted width.

望笑 2024-07-25 17:28:35

您可以使用 justify 函数进行左对齐、右对齐和居中给定宽度字段中的字符串。

'hi'.ljust(10) -> 'hi        '

You can use justify functions to left-justify, right-justify and center a string in a field of given width.

'hi'.ljust(10) -> 'hi        '
鹿港小镇 2024-07-25 17:28:35

我知道这个线程已经很老了,但是我们使用一个名为 django-copybook 的库。 它与 django 无关(不再)。 我们用它在固定宽度的 cobol 文件和 python 之间切换。 您创建一个类来定义固定宽度记录布局,并且可以在键入的 python 对象和固定宽度文件之间轻松移动:

USAGE:
class Person(Record):
    first_name = fields.StringField(length=20)
    last_name = fields.StringField(length=30)
    siblings = fields.IntegerField(length=2)
    birth_date = fields.DateField(length=10, format="%Y-%m-%d")

>>> fixedwidth_record = 'Joe                 Smith                         031982-09-11'
>>> person = Person.from_record(fixedwidth_record)
>>> person.first_name
'Joe'
>>> person.last_name
'Smith'
>>> person.siblings
3
>>> person.birth_date
datetime.date(1982, 9, 11)

它还可以处理类似于 Cobol 的 OCCURS 功能的情况,例如当特定部分重复 X 次时

I know this thread is quite old, but we use a library called django-copybook. It has nothing to do with django (anymore). We use it to go between fixed width cobol files and python. You create a class to define your fixed width record layout and can easy move between typed python objects and fixed width files:

USAGE:
class Person(Record):
    first_name = fields.StringField(length=20)
    last_name = fields.StringField(length=30)
    siblings = fields.IntegerField(length=2)
    birth_date = fields.DateField(length=10, format="%Y-%m-%d")

>>> fixedwidth_record = 'Joe                 Smith                         031982-09-11'
>>> person = Person.from_record(fixedwidth_record)
>>> person.first_name
'Joe'
>>> person.last_name
'Smith'
>>> person.siblings
3
>>> person.birth_date
datetime.date(1982, 9, 11)

It can also handle situations similar to Cobol's OCCURS functionality like when a particular section is repeated X times

绅士风度i 2024-07-25 17:28:35

我使用了 Jarret Hardie 的例子并稍作修改。 这允许选择文本对齐类型(左对齐、右对齐或居中)。

class FixedWidthFieldLine(object):
    def __init__(self, fields, justify = 'L'):
        """ Returns line from list containing tuples of field values and lengths. Accepts
            justification parameter.
            FixedWidthFieldLine(fields[, justify])

            fields = [(value, fieldLenght)[, ...]]
        """
        self.fields = fields

        if (justify in ('L','C','R')):
            self.justify = justify
        else:
            self.justify = 'L'

    def __str__(self):
        if(self.justify == 'L'):
            return ''.join([field[0].ljust(field[1]) for field in self.fields])
        elif(self.justify == 'R'):
            return ''.join([field[0].rjust(field[1]) for field in self.fields])
        elif(self.justify == 'C'):
            return ''.join([field[0].center(field[1]) for field in self.fields])

fieldTest = [('Alex', 10),
         ('Programmer', 20),
         ('Salem, OR', 15)]

f = FixedWidthFieldLine(fieldTest)
print f
f = FixedWidthFieldLine(fieldTest,'R')
print f

返回:

Alex      Programmer          Salem, OR      
      Alex          Programmer      Salem, OR

I used Jarret Hardie's example and modified it slightly. This allows for selection of type of text alignment(left, right or centered.)

class FixedWidthFieldLine(object):
    def __init__(self, fields, justify = 'L'):
        """ Returns line from list containing tuples of field values and lengths. Accepts
            justification parameter.
            FixedWidthFieldLine(fields[, justify])

            fields = [(value, fieldLenght)[, ...]]
        """
        self.fields = fields

        if (justify in ('L','C','R')):
            self.justify = justify
        else:
            self.justify = 'L'

    def __str__(self):
        if(self.justify == 'L'):
            return ''.join([field[0].ljust(field[1]) for field in self.fields])
        elif(self.justify == 'R'):
            return ''.join([field[0].rjust(field[1]) for field in self.fields])
        elif(self.justify == 'C'):
            return ''.join([field[0].center(field[1]) for field in self.fields])

fieldTest = [('Alex', 10),
         ('Programmer', 20),
         ('Salem, OR', 15)]

f = FixedWidthFieldLine(fieldTest)
print f
f = FixedWidthFieldLine(fieldTest,'R')
print f

Returns:

Alex      Programmer          Salem, OR      
      Alex          Programmer      Salem, OR
鹤舞 2024-07-25 17:28:35

解析你的问题有点困难,但我收集到你正在接收一个文件或类似文件的对象,阅读它,并用一些业务逻辑结果替换一些值。 它是否正确?

克服字符串不变性的最简单方法是编写一个新字符串:

# Won't work:
test_string[3:6] = "foo"

# Will work:
test_string = test_string[:3] + "foo" + test_string[6:]

话虽如此,听起来对您来说用这个字符串做一些事情很重要,但我不确定那到底是什么。 您是否将其写回输出文件,尝试就地编辑文件,或者其他什么? 我提出这一点是因为创建新字符串(恰好与旧字符串具有相同的变量名)的行为应该强调在转换后执行显式写入操作的必要性。

It's a little difficult to parse your question, but I'm gathering that you are receiving a file or file-like-object, reading it, and replacing some of the values with some business logic results. Is this correct?

The simplest way to overcome string immutability is to write a new string:

# Won't work:
test_string[3:6] = "foo"

# Will work:
test_string = test_string[:3] + "foo" + test_string[6:]

Having said that, it sounds like it's important to you that you do something with this string, but I'm not sure exactly what that is. Are you writing it back to an output file, trying to edit a file in place, or something else? I bring this up because the act of creating a new string (which happens to have the same variable name as the old string) should emphasize the necessity of performing an explicit write operation after the transformation.

柠檬色的秋千 2024-07-25 17:28:35

您可以将字符串转换为列表并进行切片操作。

>>> text = list("some text")
>>> text[0:4] = list("fine")
>>> text
['f', 'i', 'n', 'e', ' ', 't', 'e', 'x', 't']
>>> text[0:4] = list("all")
>>> text
['a', 'l', 'l', ' ', 't', 'e', 'x', 't']
>>> import string
>>> string.join(text, "")
'all text'

You can convert the string to a list and do the slice manipulation.

>>> text = list("some text")
>>> text[0:4] = list("fine")
>>> text
['f', 'i', 'n', 'e', ' ', 't', 'e', 'x', 't']
>>> text[0:4] = list("all")
>>> text
['a', 'l', 'l', ' ', 't', 'e', 'x', 't']
>>> import string
>>> string.join(text, "")
'all text'
怪我闹别瞎闹 2024-07-25 17:28:35

编写“修改”字符串的函数很容易。

def change(string, start, end, what):
    length = end - start
    if len(what)<length: what = what + " "*(length-len(what))
    return string[0:start]+what[0:length]+string[end:]

用法:

test_string = 'This is test string'

print test_string[5:7]  
# is
test_string = change(test_string, 5, 7, 'IS')
# This IS test string
test_string = change(test_string, 8, 12, 'X')
# This IS X    string
test_string = change(test_string, 8, 12, 'XXXXXXXXXXXX')
# This IS XXXX string

It is easy to write function to "modify" string.

def change(string, start, end, what):
    length = end - start
    if len(what)<length: what = what + " "*(length-len(what))
    return string[0:start]+what[0:length]+string[end:]

Usage:

test_string = 'This is test string'

print test_string[5:7]  
# is
test_string = change(test_string, 5, 7, 'IS')
# This IS test string
test_string = change(test_string, 8, 12, 'X')
# This IS X    string
test_string = change(test_string, 8, 12, 'XXXXXXXXXXXX')
# This IS XXXX string
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文