是否有一个脚本可以运行来删除 .txt 文件中的所有硬(回车)回车?
我有一个包含大量文本的 .txt (Mac OS X Snow Leopard) 文件。在段落末尾,有一个硬回车,将下一段移动到另一行。这导致我想要将内容放入数据库时出现一些问题,所以我想知道是否有办法删除硬返回?有某种我可以运行的脚本吗?我真的希望我不必经历并手动取出硬退货。
回顾一下,现在的情况如下:
This is some text. Text is what this is.
And then this is the next paragraph that is on a different line.
这就是我想要得到的:
This is some text. Text is what this is. And then this is the next paragraph that is on a different line.
对于我的 .txt 文件中的所有几千行。
谢谢!
编辑:
我在 txt 文件中处理的文本实际上是 HTML:
<a href="/link/link/1"> <span class="text">1 </span> THis is where my text is</a><br/>
当我在终端中运行 cat 命令时,如下所述,只有第一个在那里。其他的都不见了……
I have a .txt (Mac OS X Snow Leopard) file that has a lot of text. At the end of a paragraph, there is a hard return that moves the next paragraph onto another line. This is causing some issues with what I am wanting to do to get the content into my db, so I am wondering if there is anyway I can remove the hard returns? Is there some sort of script I can run? I am really hoping I don't have to go through and manually take the hard returns out.
To recap, here is what it looks like now:
This is some text. Text is what this is.
And then this is the next paragraph that is on a different line.
And this is what I would like to get to:
This is some text. Text is what this is. And then this is the next paragraph that is on a different line.
For all several thousand lines in my .txt file.
Thanks!
EDIT:
The text I am dealing with in my txt file is actually HTML:
<a href="/link/link/1"> <span class="text">1 </span> THis is where my text is</a><br/>
And when I run the cat command in terminal like mentioned below, only the first is there. Everything else is missing...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在终端中:
可能有一种更有效的方法来做到这一点,因为“tr -d '\r'”是活性成分,但这就是想法。
In a terminal:
There's probably a more efficient way to do this, since the "tr -d '\r'" is the active ingredient, but that's the idea.
我通常只使用具有良好正则表达式支持的编辑器。 TextWrangler 很棒。
TextWrangler 中的行尾是
\r
,因此要删除它,只需搜索\r
并将其替换为空格即可。说实话,我一直想知道它如何处理 CRLF 编码的文件,但不知怎的它是有效的。I normally just use an editor with good Regular Expression support. TextWrangler is great.
An end of line in TextWrangler is
\r
, so to remove it, just search for\r
and replace it with a space. TBH, I always wondered how it handles CRLF-encoded files, but somehow it works.我相信你可以用 Applescript 做到这一点。不幸的是,我不熟悉它,但是以下内容应该可以帮助您完成此任务(这是针对不同的问题,但它将引导您走向您需要走的方向): http://macscripter.net/viewtopic.php?id=18762
或者,如果您不想使用 Applescript 执行此操作并安装了 Excel (或访问到带有它的机器)那么以下内容应该有所帮助: http://www.mrexcel .com/forum/showthread.php?t=474054
I believe you can do this with Applescript. Unfortunately I'm not familiar with it however the following should help you to acomplish this (it's for a different problem but it will lead you in the direction you need to go): http://macscripter.net/viewtopic.php?id=18762
Alternatively if you didn't want to do this with Applescript and have Excel installed (or access to a machine with it) then the following should help: http://www.mrexcel.com/forum/showthread.php?t=474054
在 Linux 终端中
cat file.txt | tr -d "\r\n" | tr -d "\r\n" | > new file.txt
就可以了。修改\r\n
部分以删除所需的章程。In Linux terminal
cat file.txt | tr -d "\r\n" | > new file.txt
will do. Modify\r\n
part to remove desired charters.