将文件读入由段落分隔的数组中 Python

发布于 2024-12-18 03:02:26 字数 137 浏览 0 评论 0原文

我有一个文本文件,我想将该文本文件读入3个不同的数组,array1、array2和array3。第一段放入 array1,第二段放入 array2,依此类推。然后将第 4 段放入 array1 element2 中,依此类推,段落之间用空行分隔。有什么想法吗?

I have a text file, I want to read this text file into 3 different arrays, array1 array2 and array3. the first paragraph gets put in array1, the second paragraph gets put in array2 and so on. the 4th paragraph will then be put in array1 element2 and so forth, paragraphs are separated by a blank line. any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

时光是把杀猪刀 2024-12-25 03:02:26

这是我要尝试的基本代码:

f = open('data.txt', 'r')

data = f.read()
array1 = []
array2 = []
array3 = []
splat = data.split("\n\n")
for number, paragraph in enumerate(splat, 1):
    if number % 3 == 1:
        array1 += [paragraph]
    elif number % 3 == 2:
        array2 += [paragraph]
    elif number % 3 == 0:
        array3 += [paragraph]

这应该足以让您开始。如果文件中的段落被两个新行分割,那么“\n\n”应该可以解决分割它们的问题。

This is the basic code I would try:

f = open('data.txt', 'r')

data = f.read()
array1 = []
array2 = []
array3 = []
splat = data.split("\n\n")
for number, paragraph in enumerate(splat, 1):
    if number % 3 == 1:
        array1 += [paragraph]
    elif number % 3 == 2:
        array2 += [paragraph]
    elif number % 3 == 0:
        array3 += [paragraph]

This should be enough to get you started. If the paragraphs in the file are split by two new lines then "\n\n" should do the trick for splitting them.

宫墨修音 2024-12-25 03:02:26
import itertools as it


def paragraphs(fileobj, separator='\n'):
    """Iterate a fileobject by paragraph"""
    ## Makes no assumptions about the encoding used in the file
    lines = []
    for line in fileobj:
        if line == separator and lines:
            yield ''.join(lines)
            lines = []
        else:
            lines.append(line)
    yield ''.join(lines)

paragraph_lists = [[], [], []]
with open('/Users/robdev/Desktop/test.txt') as f:
    paras = paragraphs(f)
    for para, group in it.izip(paras, it.cycle(paragraph_lists)):
        group.append(para)

print paragraph_lists
import itertools as it


def paragraphs(fileobj, separator='\n'):
    """Iterate a fileobject by paragraph"""
    ## Makes no assumptions about the encoding used in the file
    lines = []
    for line in fileobj:
        if line == separator and lines:
            yield ''.join(lines)
            lines = []
        else:
            lines.append(line)
    yield ''.join(lines)

paragraph_lists = [[], [], []]
with open('/Users/robdev/Desktop/test.txt') as f:
    paras = paragraphs(f)
    for para, group in it.izip(paras, it.cycle(paragraph_lists)):
        group.append(para)

print paragraph_lists
画▽骨i 2024-12-25 03:02:26

我知道这个问题很久以前就被问过,但只是提出我的意见,以便在某个时候对其他人有用。
我知道了根据段落分隔符将输入文件拆分为段落的更简单方法(可以是 \n 或空格或其他任何内容),并且您的问题的代码片段如下所示:

with open("input.txt", "r") as input:
    input_ = input.read().split("\n\n")   #\n\n denotes there is a blank line in between paragraphs.

执行此命令后,如果您尝试打印 input_[0] 它将显示第一段,input_[1] 将显示第二段,依此类推。因此,它将输入文件中存在的所有段落放入一个列表中,每个列表元素都包含输入文件中的一个段落。

I know this question was asked long before but just putting my inputs so that it will be useful to somebody else at some point of time.
I got to know much easier way to split the input file into paragraphs based on the Paragraph Separator(it can be a \n or a blank space or anything else) and the code snippet for your question is given below :

with open("input.txt", "r") as input:
    input_ = input.read().split("\n\n")   #\n\n denotes there is a blank line in between paragraphs.

And after executing this command, if you try to print input_[0] it will show the first paragraph, input_[1] will show the second paragraph and so on. So it is putting all the paragraphs present in the input file into an List with each List element contains a paragraph from the input file.

祁梦 2024-12-25 03:02:26

此代码将搜索两点之间的线:

rr = [] #Array for saving lines    
for f in file_list:
    with open(f, 'rt') as fl:
        lines = fl.read()
        lines = lines[lines.find('String1'):lines.find('String2')] 
        rr.append(lines)

This code will search for lines between two points:

rr = [] #Array for saving lines    
for f in file_list:
    with open(f, 'rt') as fl:
        lines = fl.read()
        lines = lines[lines.find('String1'):lines.find('String2')] 
        rr.append(lines)
自找没趣 2024-12-25 03:02:26

因为我想炫耀一下:

with open('data.txt') as f:
    f = list(f)
    a, b, c = (list(__import__('itertools').islice(f, i, None, 3)) for i in range(3))

Because I feel like showing off:

with open('data.txt') as f:
    f = list(f)
    a, b, c = (list(__import__('itertools').islice(f, i, None, 3)) for i in range(3))
毁虫ゝ 2024-12-25 03:02:26

使用切片也可以。

par_separator = "\n\n"
paragraphs = "1\n\n2\n\n3\n\n4\n\n5\n\n6".split(par_separator)
a,b,c = paragraphs[0:len(paragraphs):3], paragraphs[1:len(paragraphs):3],\
        paragraphs[2:len(paragraphs):3] 

切片内:[开始索引,结束索引,步骤]

Using slices would also work.

par_separator = "\n\n"
paragraphs = "1\n\n2\n\n3\n\n4\n\n5\n\n6".split(par_separator)
a,b,c = paragraphs[0:len(paragraphs):3], paragraphs[1:len(paragraphs):3],\
        paragraphs[2:len(paragraphs):3] 

Within slice: [start index, end index,step]

天涯离梦残月幽梦 2024-12-25 03:02:26

绕过切片的更优雅的方法:

def grouper(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

for p in grouper(5,[sent.strip() for sent in text.split('\n') if sent !='']):
    print p

只需确保在最终文本中处理 None

More elegant way to bypass slices:

def grouper(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

for p in grouper(5,[sent.strip() for sent in text.split('\n') if sent !='']):
    print p

Just make sure you deal with None in final text

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文