删除文本文件中特定位置的换行符
我有一个很大的文本文件,由于控制台宽度,该文件在第 80 列有换行符。文本文件中的许多行的长度都不是 80 个字符,并且不受换行符的影响。在伪代码中,这就是我想要的:
- 迭代文件中的行
- 如果行匹配此正则表达式模式:^(.{80})\n(.+)
- 将此行替换为由 match.group(1) 和 match.group(2) 组成的新字符串。只需删除该行的换行符即可。
- 如果该行与正则表达式不匹配,请跳过!
也许我不需要正则表达式来做到这一点?
I have a large textfile, which has linebreaks at column 80 due to console width. Many of the lines in the textfile are not 80 characters long, and are not affected by the linebreak. In pseudocode, this is what I want:
- Iterate through lines in file
- If line matches this regex pattern: ^(.{80})\n(.+)
- Replace this line with a new string consisting of match.group(1) and match.group(2). Just remove the linebreak from this line.
- If line doesn't match the regex, skip!
Maybe I don't need regex to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是一些应该解决问题的代码
请注意,与伪代码相比,这将合并任意数量的连续折叠线。
Here's some code which should to the trick
Note that compared to your pseudo code, this will merge any number of consecutive folded lines.
考虑一下这一点。
我发现显式生成器函数可以更轻松地测试和调试脚本的基本逻辑,而无需创建模拟文件系统或进行大量花哨的设置和拆卸来进行测试。
Consider this.
I find that an explicit generator function makes it much easier to test and debug the essential logic of the script without having to create mock filesystems or do lots of fancy setup and teardown for testing.
以下是如何使用正则表达式来归档此内容的示例。但正则表达式并不是所有地方的最佳解决方案,在这种情况下,我认为不使用正则表达式会更有效。无论如何,这是解决方案:
当您使用可调用对象调用
re.sub
时,您也可以使用正则表达式:Here is an example of how to use regular expressions to archive this. But regular expressions aren't the best solution everywhere and in this case, i think not using regular expressions is more efficient. Anyway, here is the solution:
You can also use the your regular expression when you call
re.sub
with a callable: