如何从文本文件中删除不需要的符号

发布于 2025-01-18 21:33:05 字数 2706 浏览 2 评论 0原文

我有一些带有不需要的符号的文本文件,例如

?~, ?~@?, -?~, ?~H~Z, ?~@~S, ?~@~T, : ?~@~], ?, etc

在此处输入图像描述

实际文本:

~@~\SEPA for cards is the next logical step in European retail payments integration~@~], says Yves Mersch, Member of the Executive Board of the ECB.
~@~\Giuseppe Penone's tree conveys a sense of stability and growth and is rooted in the humanist values of Europe in the most beautiful way~@~], said Benoît C~Suré
, Member of the Executive Board of the ECB and chair of the art jury which selected the artwork.
~@~\The euro banknotes and coins in everyone~@~Ys wallets are the same in the whole euro area.
~@~\The introduction of the new ~B50 will make our currency even safer~@~], Yves Mersch, ECB Executive Board member, said.
~@~\The successful completion of SEPA further accelerates Europe~@~Ys financial integration~@~], said Yves Mersch, ECB Executive Board member.

十六进制转储

00000000  e2 80 9c 53 45 50 41 20  66 6f 72 20 63 61 72 64  |...SEPA for card|
00000010  73 20 69 73 20 74 68 65  20 6e 65 78 74 20 6c 6f  |s is the next lo|
00000020  67 69 63 61 6c 20 73 74  65 70 20 69 6e 20 45 75  |gical step in Eu|
00000030  72 6f 70 65 61 6e 20 72  65 74 61 69 6c 20 70 61  |ropean retail pa|
00000040  79 6d 65 6e 74 73 20 69  6e 74 65 67 72 61 74 69  |yments integrati|
00000050  6f 6e e2 80 9d 2c 20 73  61 79 73 20 59 76 65 73  |on..., says Yves|
00000060  20 4d 65 72 73 63 68 2c  20 4d 65 6d 62 65 72 20  | Mersch, Member |
00000070  6f 66 20 74 68 65 20 45  78 65 63 75 74 69 76 65  |of the Executive|
00000080  20 42 6f 61 72 64 20 6f  66 20 74 68 65 20 45 43  | Board of the EC|
00000090  42 2e 0a e2 80 9c 54 68  65 20 73 75 63 63 65 73  |B.....The succes|
000000a0  73 66 75 6c 20 63 6f 6d  70 6c 65 74 69 6f 6e 20  |sful completion |
000000b0  6f 66 20 53 45 50 41 20  66 75 72 74 68 65 72 20  |of SEPA further |
000000c0  61 63 63 65 6c 65 72 61  74 65 73 20 45 75 72 6f  |accelerates Euro|
000000d0  70 65 e2 80 99 73 20 66  69 6e 61 6e 63 69 61 6c  |pe...s financial|
000000e0  20 69 6e 74 65 67 72 61  74 69 6f 6e e2 80 9d 2c  | integration...,|
000000f0  20 73 61 69 64 20 59 76  65 73 20 4d 65 72 73 63  | said Yves Mersc|
00000100  68 2c 20 45 43 42 20 45  78 65 63 75 74 69 76 65  |h, ECB Executive|
00000110  20 42 6f 61 72 64 20 6d  65 6d 62 65 72 2e 0a     | Board member..|
0000011f

其中有很多。每当我使用 vim 时它们就会出现。我尝试使用 sed 来删除它们

sed -i 's#?~@?##g' file.txt

,但没有成功。 这些符号是什么?如何使用 bash 或 python 删除它们?

I have some text files with unwanted symbols such as

?~, ?~@?, -?~, ?~H~Z, ?~@~S, ?~@~T, : ?~@~], ?, etc

enter image description here

The actual text:

~@~\SEPA for cards is the next logical step in European retail payments integration~@~], says Yves Mersch, Member of the Executive Board of the ECB.
~@~\Giuseppe Penone's tree conveys a sense of stability and growth and is rooted in the humanist values of Europe in the most beautiful way~@~], said Benoît C~Suré
, Member of the Executive Board of the ECB and chair of the art jury which selected the artwork.
~@~\The euro banknotes and coins in everyone~@~Ys wallets are the same in the whole euro area.
~@~\The introduction of the new ~B50 will make our currency even safer~@~], Yves Mersch, ECB Executive Board member, said.
~@~\The successful completion of SEPA further accelerates Europe~@~Ys financial integration~@~], said Yves Mersch, ECB Executive Board member.

The hex dump

00000000  e2 80 9c 53 45 50 41 20  66 6f 72 20 63 61 72 64  |...SEPA for card|
00000010  73 20 69 73 20 74 68 65  20 6e 65 78 74 20 6c 6f  |s is the next lo|
00000020  67 69 63 61 6c 20 73 74  65 70 20 69 6e 20 45 75  |gical step in Eu|
00000030  72 6f 70 65 61 6e 20 72  65 74 61 69 6c 20 70 61  |ropean retail pa|
00000040  79 6d 65 6e 74 73 20 69  6e 74 65 67 72 61 74 69  |yments integrati|
00000050  6f 6e e2 80 9d 2c 20 73  61 79 73 20 59 76 65 73  |on..., says Yves|
00000060  20 4d 65 72 73 63 68 2c  20 4d 65 6d 62 65 72 20  | Mersch, Member |
00000070  6f 66 20 74 68 65 20 45  78 65 63 75 74 69 76 65  |of the Executive|
00000080  20 42 6f 61 72 64 20 6f  66 20 74 68 65 20 45 43  | Board of the EC|
00000090  42 2e 0a e2 80 9c 54 68  65 20 73 75 63 63 65 73  |B.....The succes|
000000a0  73 66 75 6c 20 63 6f 6d  70 6c 65 74 69 6f 6e 20  |sful completion |
000000b0  6f 66 20 53 45 50 41 20  66 75 72 74 68 65 72 20  |of SEPA further |
000000c0  61 63 63 65 6c 65 72 61  74 65 73 20 45 75 72 6f  |accelerates Euro|
000000d0  70 65 e2 80 99 73 20 66  69 6e 61 6e 63 69 61 6c  |pe...s financial|
000000e0  20 69 6e 74 65 67 72 61  74 69 6f 6e e2 80 9d 2c  | integration...,|
000000f0  20 73 61 69 64 20 59 76  65 73 20 4d 65 72 73 63  | said Yves Mersc|
00000100  68 2c 20 45 43 42 20 45  78 65 63 75 74 69 76 65  |h, ECB Executive|
00000110  20 42 6f 61 72 64 20 6d  65 6d 62 65 72 2e 0a     | Board member..|
0000011f

There are a lot of them. They show up whenever I use vim. I have tried using sed to remove them

sed -i 's#?~@?##g' file.txt

But it did not work.
What are those symbols? How do I remove them either with bash or python?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

雨后彩虹 2025-01-25 21:33:05

使用ICONV。

iconv -f UTF-8 -t ASCII test.txt

UTF-8作为输入格式是一个猜测,但是正如@fravadona和@jesujoba Alabi在评论中指出的那样。

注4/5/2022 13:13:
我用六角形制作的测试输入文件test.txt被畸形(endianness?),但是-c选项丢弃了任何未知输入,因此它起作用。删除了-c选项。

输出:

SEPA for cards is the next logical step in European retail payments integration, says Yves Mersch, Member of the Executive Board of the ECB.
The successful completion of SEPA further accelerates Europes financial integration, said Yves Mersch, ECB Executive Board member.

Use iconv.

iconv -f UTF-8 -t ASCII test.txt

UTF-8 as input format is a guess, but as @Fravadona and @Jesujoba ALABI pointed out in the comments correct.

Note 4/5/2022 13:13:
The test input file test.txt I made from the hexdump was malformed (endianness?), but the -c option discards any unknown input so it worked. Removed the -c option.

Output:

SEPA for cards is the next logical step in European retail payments integration, says Yves Mersch, Member of the Executive Board of the ECB.
The successful completion of SEPA further accelerates Europes financial integration, said Yves Mersch, ECB Executive Board member.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文