如何从文本文件中删除不需要的符号
我有一些带有不需要的符号的文本文件,例如
?~, ?~@?, -?~, ?~H~Z, ?~@~S, ?~@~T, : ?~@~], ?, etc
实际文本:
~@~\SEPA for cards is the next logical step in European retail payments integration~@~], says Yves Mersch, Member of the Executive Board of the ECB.
~@~\Giuseppe Penone's tree conveys a sense of stability and growth and is rooted in the humanist values of Europe in the most beautiful way~@~], said Benoît C~Suré
, Member of the Executive Board of the ECB and chair of the art jury which selected the artwork.
~@~\The euro banknotes and coins in everyone~@~Ys wallets are the same in the whole euro area.
~@~\The introduction of the new ~B50 will make our currency even safer~@~], Yves Mersch, ECB Executive Board member, said.
~@~\The successful completion of SEPA further accelerates Europe~@~Ys financial integration~@~], said Yves Mersch, ECB Executive Board member.
十六进制转储
00000000 e2 80 9c 53 45 50 41 20 66 6f 72 20 63 61 72 64 |...SEPA for card|
00000010 73 20 69 73 20 74 68 65 20 6e 65 78 74 20 6c 6f |s is the next lo|
00000020 67 69 63 61 6c 20 73 74 65 70 20 69 6e 20 45 75 |gical step in Eu|
00000030 72 6f 70 65 61 6e 20 72 65 74 61 69 6c 20 70 61 |ropean retail pa|
00000040 79 6d 65 6e 74 73 20 69 6e 74 65 67 72 61 74 69 |yments integrati|
00000050 6f 6e e2 80 9d 2c 20 73 61 79 73 20 59 76 65 73 |on..., says Yves|
00000060 20 4d 65 72 73 63 68 2c 20 4d 65 6d 62 65 72 20 | Mersch, Member |
00000070 6f 66 20 74 68 65 20 45 78 65 63 75 74 69 76 65 |of the Executive|
00000080 20 42 6f 61 72 64 20 6f 66 20 74 68 65 20 45 43 | Board of the EC|
00000090 42 2e 0a e2 80 9c 54 68 65 20 73 75 63 63 65 73 |B.....The succes|
000000a0 73 66 75 6c 20 63 6f 6d 70 6c 65 74 69 6f 6e 20 |sful completion |
000000b0 6f 66 20 53 45 50 41 20 66 75 72 74 68 65 72 20 |of SEPA further |
000000c0 61 63 63 65 6c 65 72 61 74 65 73 20 45 75 72 6f |accelerates Euro|
000000d0 70 65 e2 80 99 73 20 66 69 6e 61 6e 63 69 61 6c |pe...s financial|
000000e0 20 69 6e 74 65 67 72 61 74 69 6f 6e e2 80 9d 2c | integration...,|
000000f0 20 73 61 69 64 20 59 76 65 73 20 4d 65 72 73 63 | said Yves Mersc|
00000100 68 2c 20 45 43 42 20 45 78 65 63 75 74 69 76 65 |h, ECB Executive|
00000110 20 42 6f 61 72 64 20 6d 65 6d 62 65 72 2e 0a | Board member..|
0000011f
其中有很多。每当我使用 vim 时它们就会出现。我尝试使用 sed 来删除它们
sed -i 's#?~@?##g' file.txt
,但没有成功。 这些符号是什么?如何使用 bash 或 python 删除它们?
I have some text files with unwanted symbols such as
?~, ?~@?, -?~, ?~H~Z, ?~@~S, ?~@~T, : ?~@~], ?, etc
The actual text:
~@~\SEPA for cards is the next logical step in European retail payments integration~@~], says Yves Mersch, Member of the Executive Board of the ECB.
~@~\Giuseppe Penone's tree conveys a sense of stability and growth and is rooted in the humanist values of Europe in the most beautiful way~@~], said Benoît C~Suré
, Member of the Executive Board of the ECB and chair of the art jury which selected the artwork.
~@~\The euro banknotes and coins in everyone~@~Ys wallets are the same in the whole euro area.
~@~\The introduction of the new ~B50 will make our currency even safer~@~], Yves Mersch, ECB Executive Board member, said.
~@~\The successful completion of SEPA further accelerates Europe~@~Ys financial integration~@~], said Yves Mersch, ECB Executive Board member.
The hex dump
00000000 e2 80 9c 53 45 50 41 20 66 6f 72 20 63 61 72 64 |...SEPA for card|
00000010 73 20 69 73 20 74 68 65 20 6e 65 78 74 20 6c 6f |s is the next lo|
00000020 67 69 63 61 6c 20 73 74 65 70 20 69 6e 20 45 75 |gical step in Eu|
00000030 72 6f 70 65 61 6e 20 72 65 74 61 69 6c 20 70 61 |ropean retail pa|
00000040 79 6d 65 6e 74 73 20 69 6e 74 65 67 72 61 74 69 |yments integrati|
00000050 6f 6e e2 80 9d 2c 20 73 61 79 73 20 59 76 65 73 |on..., says Yves|
00000060 20 4d 65 72 73 63 68 2c 20 4d 65 6d 62 65 72 20 | Mersch, Member |
00000070 6f 66 20 74 68 65 20 45 78 65 63 75 74 69 76 65 |of the Executive|
00000080 20 42 6f 61 72 64 20 6f 66 20 74 68 65 20 45 43 | Board of the EC|
00000090 42 2e 0a e2 80 9c 54 68 65 20 73 75 63 63 65 73 |B.....The succes|
000000a0 73 66 75 6c 20 63 6f 6d 70 6c 65 74 69 6f 6e 20 |sful completion |
000000b0 6f 66 20 53 45 50 41 20 66 75 72 74 68 65 72 20 |of SEPA further |
000000c0 61 63 63 65 6c 65 72 61 74 65 73 20 45 75 72 6f |accelerates Euro|
000000d0 70 65 e2 80 99 73 20 66 69 6e 61 6e 63 69 61 6c |pe...s financial|
000000e0 20 69 6e 74 65 67 72 61 74 69 6f 6e e2 80 9d 2c | integration...,|
000000f0 20 73 61 69 64 20 59 76 65 73 20 4d 65 72 73 63 | said Yves Mersc|
00000100 68 2c 20 45 43 42 20 45 78 65 63 75 74 69 76 65 |h, ECB Executive|
00000110 20 42 6f 61 72 64 20 6d 65 6d 62 65 72 2e 0a | Board member..|
0000011f
There are a lot of them. They show up whenever I use vim
. I have tried using sed
to remove them
sed -i 's#?~@?##g' file.txt
But it did not work.
What are those symbols? How do I remove them either with bash
or python
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用ICONV。
UTF-8作为输入格式是一个猜测,但是正如@fravadona和@jesujoba Alabi在评论中指出的那样。
注4/5/2022 13:13:
我用六角形制作的测试输入文件test.txt被畸形(endianness?),但是-c选项丢弃了任何未知输入,因此它起作用。删除了-c选项。
输出:
Use iconv.
UTF-8 as input format is a guess, but as @Fravadona and @Jesujoba ALABI pointed out in the comments correct.
Note 4/5/2022 13:13:
The test input file test.txt I made from the hexdump was malformed (endianness?), but the -c option discards any unknown input so it worked. Removed the -c option.
Output: