当前行以 ^M 结尾时在文件中追加下一行
我有一个从工具输出的文件(从数据库查询)。其中一个字段的某些行末尾有 ^M 字符。这会导致我的输出被损坏。当当前行以 ^M 结尾时,如何将下一行追加到当前行。
我的机器安装了 sed、awk、perl、ruby 和 python,我使用的是 bash。
我使用 sed 尝试了以下操作:
sed -e :a -e '/^M$/N; s/^M\n//; ta' sourcefile > destfile
但这不起作用。
请指教。
谢谢, 卡西克·S。
I have a file which is output from a tool (queried from a DB). One of the fields has ^M characters at the end of some of the lines. This causes my output to be corrupted. How do I append the next line to the current line when the current line ends with ^M.
My machine has sed, awk, perl, ruby and python installed and I am using bash.
I tried the following using sed:
sed -e :a -e '/^M$/N; s/^M\n//; ta' sourcefile > destfile
But that did not work.
Please advise.
Thanks,
Karthick S.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
^M
表示 ctrl+M,它是一个字符,而不是两个字符。当您进行替换时,您输入的是两个字符还是一个字符?一个字符:ctrl+v,然后 ctrl+m(正确)
两个字符:^ 然后 M (不正确,但可能看起来相同)
^M
means ctrl+M, and it's one character, not two. When you're doing your replacing, did you type it as two characters or one?One character: ctrl+v then ctrl+m (correct)
Two characters: ^ then M (incorrect, but probably looks the same)
您看到的 ^M 字符实际上可能是回车符。您应该在正则表达式中使用 \r 来匹配那些。数据可能是由使用 CRLF 作为行结尾的系统(最有可能是 Windows)而不是仅使用 LF (像大多数 *nix 系统那样)插入到数据库中的。我猜它们后面已经有换行符,所以您可能想删除它们,而不是用换行符替换它们。
您的系统上可能有 dos2unix 命令,它可以为您转换这些行结尾。
您可能想首先使用 dos2unix 确保行结尾一致。之后,您可以像这样删除换行符:
cat infile | dos2unix | tr -d '\n' >输出文件
。如果您想要在换行符所在的位置留出一个空格,可以使用 cat infile | dos2unix | tr '\n' ' ' >输出文件。
顺便说一句,使用 sed 删除换行符很困难,因为 sed 在文件中编辑行,一次处理一行。
The ^M character your seeing is probably actually a carriage return. You should match those using \r in regular expressions. The data is probably inserted into the database by a system which uses CRLF as line ending (Windows most likely) instead of just LF (like most *nix systems do). I guess they are followed by a linefeed already, so you probably want to remove them, not replace them with a newline.
You might have the dos2unix command available on your system which can convert those line endings for you.
You probably want to make sure the line endings are consistent first using dos2unix. After that you can remove the newlines like this:
cat infile | dos2unix | tr -d '\n' > outfile
.If you want want a space where the linebreaks used to be you can use
cat infile | dos2unix | tr '\n' ' ' > outfile
.As a side note, using sed to remove newlines is hard because sed edits lines in the file processing one line at a time.
我希望我正确理解你的要求。请参阅下面的测试:
一个名为 test 的文件:
注意,我使用
ctrl-v, ctrl 在 vim 中输入的
^M
-m现在可以看到以下 awk 行的输出。我希望这就是您所需要的:
i hope I understood your requirement correctly. see the test below:
a file called test:
note that, the
^M
I typed in vim withctrl-v, ctrl-m
now see the output with following awk line. I hope that is what you needed:
为了了解文件每行末尾存在哪些字符,我使用了 @potong 注释:
我的文件每行都以
\r$
结尾,所以我这样做了:它去掉了 < code>\r (或
^M
),通过就地重写文件来实现。这是sed
的打印结果:To understand what caracters were present at the end of each line of my file, I used @potong comments:
My file was ending with
\r$
at each lines so I did :which got rid of the
\r
(or^M
) by rewriting the file in place. Here is the resulting print ofsed
again: