将文件读入由段落分隔的数组中 Python

发布于 2024-12-18 03:02:26 字数 137 浏览 0 评论 0原文

我有一个文本文件，我想将该文本文件读入3个不同的数组，array1、array2和array3。第一段放入 array1，第二段放入 array2，依此类推。然后将第 4 段放入 array1 element2 中，依此类推，段落之间用空行分隔。有什么想法吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时光是把杀猪刀 2024-12-25 03:02:26

这是我要尝试的基本代码：

f = open('data.txt', 'r')

data = f.read()
array1 = []
array2 = []
array3 = []
splat = data.split("\n\n")
for number, paragraph in enumerate(splat, 1):
    if number % 3 == 1:
        array1 += [paragraph]
    elif number % 3 == 2:
        array2 += [paragraph]
    elif number % 3 == 0:
        array3 += [paragraph]

这应该足以让您开始。如果文件中的段落被两个新行分割，那么“\n\n”应该可以解决分割它们的问题。

This is the basic code I would try:

f = open('data.txt', 'r')

data = f.read()
array1 = []
array2 = []
array3 = []
splat = data.split("\n\n")
for number, paragraph in enumerate(splat, 1):
    if number % 3 == 1:
        array1 += [paragraph]
    elif number % 3 == 2:
        array2 += [paragraph]
    elif number % 3 == 0:
        array3 += [paragraph]

This should be enough to get you started. If the paragraphs in the file are split by two new lines then "\n\n" should do the trick for splitting them.

回复收藏 0 原文

宫墨修音 2024-12-25 03:02:26

import itertools as it


def paragraphs(fileobj, separator='\n'):
    """Iterate a fileobject by paragraph"""
    ## Makes no assumptions about the encoding used in the file
    lines = []
    for line in fileobj:
        if line == separator and lines:
            yield ''.join(lines)
            lines = []
        else:
            lines.append(line)
    yield ''.join(lines)

paragraph_lists = [[], [], []]
with open('/Users/robdev/Desktop/test.txt') as f:
    paras = paragraphs(f)
    for para, group in it.izip(paras, it.cycle(paragraph_lists)):
        group.append(para)

print paragraph_lists

import itertools as it


def paragraphs(fileobj, separator='\n'):
    """Iterate a fileobject by paragraph"""
    ## Makes no assumptions about the encoding used in the file
    lines = []
    for line in fileobj:
        if line == separator and lines:
            yield ''.join(lines)
            lines = []
        else:
            lines.append(line)
    yield ''.join(lines)

paragraph_lists = [[], [], []]
with open('/Users/robdev/Desktop/test.txt') as f:
    paras = paragraphs(f)
    for para, group in it.izip(paras, it.cycle(paragraph_lists)):
        group.append(para)

print paragraph_lists

回复收藏 0 原文

画▽骨i 2024-12-25 03:02:26

我知道这个问题很久以前就被问过，但只是提出我的意见，以便在某个时候对其他人有用。
我知道了根据段落分隔符将输入文件拆分为段落的更简单方法（可以是 \n 或空格或其他任何内容），并且您的问题的代码片段如下所示：

with open("input.txt", "r") as input:
    input_ = input.read().split("\n\n")   #\n\n denotes there is a blank line in between paragraphs.

执行此命令后，如果您尝试打印 input_[0] 它将显示第一段，input_[1] 将显示第二段，依此类推。因此，它将输入文件中存在的所有段落放入一个列表中，每个列表元素都包含输入文件中的一个段落。

I know this question was asked long before but just putting my inputs so that it will be useful to somebody else at some point of time.
I got to know much easier way to split the input file into paragraphs based on the Paragraph Separator(it can be a \n or a blank space or anything else) and the code snippet for your question is given below :

with open("input.txt", "r") as input:
    input_ = input.read().split("\n\n")   #\n\n denotes there is a blank line in between paragraphs.

And after executing this command, if you try to print input_[0] it will show the first paragraph, input_[1] will show the second paragraph and so on. So it is putting all the paragraphs present in the input file into an List with each List element contains a paragraph from the input file.

回复收藏 0 原文

祁梦 2024-12-25 03:02:26

此代码将搜索两点之间的线：

rr = [] #Array for saving lines    
for f in file_list:
    with open(f, 'rt') as fl:
        lines = fl.read()
        lines = lines[lines.find('String1'):lines.find('String2')] 
        rr.append(lines)

This code will search for lines between two points:

rr = [] #Array for saving lines    
for f in file_list:
    with open(f, 'rt') as fl:
        lines = fl.read()
        lines = lines[lines.find('String1'):lines.find('String2')] 
        rr.append(lines)

回复收藏 0 原文

自找没趣 2024-12-25 03:02:26

因为我想炫耀一下：

with open('data.txt') as f:
    f = list(f)
    a, b, c = (list(__import__('itertools').islice(f, i, None, 3)) for i in range(3))

Because I feel like showing off:

with open('data.txt') as f:
    f = list(f)
    a, b, c = (list(__import__('itertools').islice(f, i, None, 3)) for i in range(3))

回复收藏 0 原文

毁虫ゝ 2024-12-25 03:02:26

使用切片也可以。

par_separator = "\n\n"
paragraphs = "1\n\n2\n\n3\n\n4\n\n5\n\n6".split(par_separator)
a,b,c = paragraphs[0:len(paragraphs):3], paragraphs[1:len(paragraphs):3],\
        paragraphs[2:len(paragraphs):3]

切片内：[开始索引，结束索引，步骤]

Using slices would also work.

par_separator = "\n\n"
paragraphs = "1\n\n2\n\n3\n\n4\n\n5\n\n6".split(par_separator)
a,b,c = paragraphs[0:len(paragraphs):3], paragraphs[1:len(paragraphs):3],\
        paragraphs[2:len(paragraphs):3]

Within slice: [start index, end index,step]

回复收藏 0 原文

天涯离梦残月幽梦 2024-12-25 03:02:26

绕过切片的更优雅的方法：

def grouper(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

for p in grouper(5,[sent.strip() for sent in text.split('\n') if sent !='']):
    print p

只需确保在最终文本中处理 None

More elegant way to bypass slices:

def grouper(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

for p in grouper(5,[sent.strip() for sent in text.split('\n') if sent !='']):
    print p

Just make sure you deal with None in final text

回复收藏 0 原文

~没有更多了~