如何循环遍历文件中的行块?
我有一个如下所示的文本文件,其中的行块由空行分隔:
ID: 1
Name: X
FamilyN: Y
Age: 20
ID: 2
Name: H
FamilyN: F
Age: 23
ID: 3
Name: S
FamilyN: Y
Age: 13
ID: 4
Name: M
FamilyN: Z
Age: 25
如何循环遍历块并处理每个块中的数据?最终我想将姓名、姓氏和年龄值收集到三列中,如下所示:
Y X 20
F H 23
Y S 13
Z M 25
I have a text file that looks like this, with blocks of lines separated by blank lines:
ID: 1
Name: X
FamilyN: Y
Age: 20
ID: 2
Name: H
FamilyN: F
Age: 23
ID: 3
Name: S
FamilyN: Y
Age: 13
ID: 4
Name: M
FamilyN: Z
Age: 25
How can I loop through the blocks and process the data in each block? eventually I want to gather the name, family name and age values into three columns, like so:
Y X 20
F H 23
Y S 13
Z M 25
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
这是另一种方法,使用 itertools.groupby。
函数
groupy
迭代文件的各行,并为每行 调用isa_group_separator(line)
。isa_group_separator
返回 True 或 False(称为key
),然后itertools.groupby
对产生相同 True 或 False 的所有连续行进行分组结果。这是一种将线路收集到组中的非常方便的方法。
Here's another way, using itertools.groupby.
The function
groupy
iterates through lines of the file and callsisa_group_separator(line)
for eachline
.isa_group_separator
returns either True or False (called thekey
), anditertools.groupby
then groups all the consecutive lines that yielded the same True or False result.This is a very convenient way to collect lines into groups.
使用发电机。
Use a generator.
结果将是
可以简单地更改为您想要的任何字符串表示形式。
Result will then be
which can be trivially changed into whatever string representation you want.
如果您的文件太大而无法一次读入内存,您仍然可以通过使用内存映射文件来使用基于正则表达式的解决方案,其中 mmap 模块:
mmap 技巧将提供一个“假装字符串”,使正则表达式可以在文件上工作,而不必将其全部读入一个大字符串。正则表达式对象的
find_iter()
方法将生成匹配项,而无需立即创建所有匹配项的完整列表(findall()
会这样做)。我确实认为这个解决方案对于这个用例来说是多余的(仍然:这是一个很好的技巧......)
If your file is too large to read into memory all at once, you can still use a regular expressions based solution by using a memory mapped file, with the mmap module:
The mmap trick will provide a "pretend string" to make regular expressions work on the file without having to read it all into one large string. And the
find_iter()
method of the regular expression object will yield matches without creating an entire list of all matches at once (whichfindall()
does).I do think this solution is overkill for this use case however (still: it's a nice trick to know...)
如果文件不大,您可以使用以下命令读取整个文件:
然后您可以使用以下命令将
content
分割为块:现在您可以创建函数来解析文本块。我将使用
split('\n')
从块获取行,并使用split(':')
获取键和值,最终使用str.strip ()
或正则表达式的一些帮助。在不检查块是否具有所需数据的情况下,代码可能如下所示:
If file is not huge you can read whole file with:
then you can split
content
to blocks using:Now you can create function to parse block of text. I would use
split('\n')
to get lines from block andsplit(':')
to get key and value, eventually withstr.strip()
or some help of regular expressions.Without checking if block has required data code can look like:
这个答案不一定比已经发布的更好,但作为我如何处理此类问题的说明,它可能很有用,特别是如果您不习惯使用 Python 的交互式解释器。
我开始知道关于这个问题的两件事。首先,我将使用 itertools.groupby 将输入分组到数据行列表中,每个单独的数据记录对应一个列表。其次,我想将这些记录表示为字典,以便我可以轻松格式化输出。
这表明的另一件事是使用生成器如何轻松地将此类问题分解为小部分。
This answer isn't necessarily better than what's already been posted, but as an illustration of how I approach problems like this it might be useful, especially if you're not used to working with Python's interactive interpreter.
I've started out knowing two things about this problem. First, I'm going to use
itertools.groupby
to group the input into lists of data lines, one list for each individual data record. Second, I want to represent those records as dictionaries so that I can easily format the output.One other thing that this shows is how using generators makes breaking a problem like this down into small parts easy.
使用字典、namedtuple 或自定义类来存储遇到的每个属性,然后在到达空行或 EOF 时将对象附加到列表中。
Use a dict, namedtuple, or custom class to store each attribute as you come across it, then append the object to a list when you reach a blank line or EOF.
简单的解决方案:
simple solution:
除了我在这里已经看到的六种其他解决方案之外,我有点惊讶的是,没有人如此简单地提出建议(即生成器、正则表达式、映射和免读取),例如,
重新格式化以适应口味。
Along with the half-dozen other solutions I already see here, I'm a bit surprised that no one has been so simple-minded (that is, generator-, regex-, map-, and read-free) as to propose, for example,
Re-format to taste.