使用python根据连字符分隔符将长文本文件分成多个文件?
致力于将单个长文本文件分成多个文件。需要放入自己的文件中的每个部分都由连字符线分隔,看起来像这样:
This is section of some sample text
that says something.
2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This says something else
3---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Maybe this says something eles
4---------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
我已经开始在 python 中进行尝试,但没有取得太大成功。我考虑过使用 split fnx 但我发现为 split fnx 提供的大多数示例都围绕 len 而不是正则表达式类型字符。这只会生成一个大文件。
with open ('someName.txt','r') as fo:
start=1
cntr=0
for x in fo.read().split("\n"):
if x=='---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------':
start = 1
cntr += 1
continue
with open (str(cntr)+'.txt','a+') as opf:
if not start:
x = '\n'+x
opf.write(x)
start = 0
Working to separate a single long text file into multiple files. Each section that needs to be placed into its own file, is separated by hyphen lines that look something like:
This is section of some sample text
that says something.
2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This says something else
3---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Maybe this says something eles
4---------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
I have started the attempt in python without much success. I considered using the split fnx but I'm finding most examples provided for the split fnx revolve around len rather than regex type characters. This only generates one large file.
with open ('someName.txt','r') as fo:
start=1
cntr=0
for x in fo.read().split("\n"):
if x=='---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------':
start = 1
cntr += 1
continue
with open (str(cntr)+'.txt','a+') as opf:
if not start:
x = '\n'+x
opf.write(x)
start = 0
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
将条件从
==
切换到in
可能会获得更好的结果。这样,如果您正在测试的行有任何前导字符,它仍然会通过条件。例如,下面我将 x 中的x=='-----...'
更改为'-----'
。更改位于长连字符串的最末尾。另一种解决方案是使用正则表达式。例如...
You might get better results from switching the conditional from
==
toin
. That way if the line you are testing has any leading characters it will still pass the condition. For example below I changed thex=='-----...'
to'-----' in x
. the change is at the very end of the long string of hyphens.An alternative solution would be to use regular expressions. For example...