使用python根据连字符分隔符将长文本文件分成多个文件?

发布于 2025-01-14 07:02:41 字数 1579 浏览 1 评论 0原文

致力于将单个长文本文件分成多个文件。需要放入自己的文件中的每个部分都由连字符线分隔,看起来像这样:

     This is section of some sample text
        that says something.
        
        2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        
        This says something else
        
        3---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    Maybe this says something eles
    
    4---------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------

我已经开始在 python 中进行尝试,但没有取得太大成功。我考虑过使用 split fnx 但我发现为 split fnx 提供的大多数示例都围绕 len 而不是正则表达式类型字符。这只会生成一个大文件。

with open ('someName.txt','r') as fo:

    start=1
    cntr=0
    for x in fo.read().split("\n"):
        if x=='---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------':
            start = 1
            cntr += 1
            continue
        with open (str(cntr)+'.txt','a+') as opf:
            if not start:
                x = '\n'+x
            opf.write(x)
            start = 0

Working to separate a single long text file into multiple files. Each section that needs to be placed into its own file, is separated by hyphen lines that look something like:

     This is section of some sample text
        that says something.
        
        2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        
        This says something else
        
        3---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    Maybe this says something eles
    
    4---------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------

I have started the attempt in python without much success. I considered using the split fnx but I'm finding most examples provided for the split fnx revolve around len rather than regex type characters. This only generates one large file.

with open ('someName.txt','r') as fo:

    start=1
    cntr=0
    for x in fo.read().split("\n"):
        if x=='---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------':
            start = 1
            cntr += 1
            continue
        with open (str(cntr)+'.txt','a+') as opf:
            if not start:
                x = '\n'+x
            opf.write(x)
            start = 0

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦在深巷 2025-01-21 07:02:41

将条件从 == 切换到 in 可能会获得更好的结果。这样,如果您正在测试的行有任何前导字符,它仍然会通过条件。例如,下面我将 x 中的 x=='-----...' 更改为 '-----'。更改位于长连字符串的最末尾。

with open ('someName.txt','r') as fo:

    start=1
    cntr=0
    for x in fo.read().split("\n"):
        if ('-----------------------------------------------------'
            '-----------------------------------------------------'
            '-----------------------------------------------------'
            '------------------------------------------------') in x:
            start = 1
            cntr += 1
            continue
        with open (str(cntr)+'.txt','a+') as opf:
            if not start:
                x = '\n'+x
            opf.write(x)
            start = 0

另一种解决方案是使用正则表达式。例如...

import re

with open('someName.txt', 'rt') as fo:
    counter = 0
    pattern = re.compile(r'--+')  # this is the regex pattern
    for group in re.split(pattern, fo.read()):
        # the re.split function used in the loop splits text by the pattern
        with open(str(counter)+'.txt','a+') as opf:
            opf.write(group)
        counter += 1

You might get better results from switching the conditional from == to in. That way if the line you are testing has any leading characters it will still pass the condition. For example below I changed the x=='-----...' to '-----' in x. the change is at the very end of the long string of hyphens.

with open ('someName.txt','r') as fo:

    start=1
    cntr=0
    for x in fo.read().split("\n"):
        if ('-----------------------------------------------------'
            '-----------------------------------------------------'
            '-----------------------------------------------------'
            '------------------------------------------------') in x:
            start = 1
            cntr += 1
            continue
        with open (str(cntr)+'.txt','a+') as opf:
            if not start:
                x = '\n'+x
            opf.write(x)
            start = 0

An alternative solution would be to use regular expressions. For example...

import re

with open('someName.txt', 'rt') as fo:
    counter = 0
    pattern = re.compile(r'--+')  # this is the regex pattern
    for group in re.split(pattern, fo.read()):
        # the re.split function used in the loop splits text by the pattern
        with open(str(counter)+'.txt','a+') as opf:
            opf.write(group)
        counter += 1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文