Java特殊字符替换
我有一段文字: ” Csuklási roham gyötörheti a svédeket, annyit emlegetikmostanság ismét a svéd modellt Magyarországon.”
在该原始文本中根本没有换行符。
当我通过电子邮件发送此文本(使用 gmail)时,我将其编码为以下内容:
Content-Type: text/plain; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable
Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket, annyit emlegetik mostans=E1g =
ism=E9t a
sv=E9d modellt Magyarorsz=E1gon.
在 HTML 中:
Content-Type: text/html; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable
<span class=3D"Apple-style-span" style=3D"font-family: Helvetica, Verdana, = sans-serif; font-size: 15px; ">Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket= , annyit emlegetik mostans=E1g ism=E9t a sv=E9d modellt Magyarorsz=E1gon.
。 ...
当我尝试将电子邮件正文解析为文本/纯文本时,我无法摆脱“mostans=E1g =”中的 = 符号 ism=E9t”在两个单词之间。请注意,HTML 编码消息中缺少相同的字符。我不知道该特殊字符可能是什么,但我需要消除它才能恢复原始文本。
我尝试替换“\n”,但不是那个,如果我在文本中按“Enter”,我可以正确地将其替换为我想要的任何字符,我也尝试过“\r”和“\t”。
所以问题是,我缺少什么?这个特殊字符是从哪里来的?如果是这样,我需要做什么来解决问题并恢复原始文本? 。
欢迎任何
帮助 巴拉兹
I have a text:
"
Csuklási roham gyötörheti a svédeket, annyit emlegetik mostanság ismét a svéd modellt Magyarországon."
In that original text there are no line breaks at all.
When I email this text (with gmail), I get it encoded as the following:
Content-Type: text/plain; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable
Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket, annyit emlegetik mostans=E1g =
ism=E9t a
sv=E9d modellt Magyarorsz=E1gon.
In HTML:
Content-Type: text/html; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable
<span class=3D"Apple-style-span" style=3D"font-family: Helvetica, Verdana, = sans-serif; font-size: 15px; ">Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket= , annyit emlegetik mostans=E1g ism=E9t a sv=E9d modellt Magyarorsz=E1gon.
....
When I try to parse the email body as text/plain, I cannot get rid of the = sign in "mostans=E1g =
ism=E9t" between the two words. Note that the same character is missing from the HTML encoded message. I don't have any idea what that special character might be, but I need to eliminate it to get back the original text.
I tried to replace '\n' but it's not that one, if I hit 'Enter' in the text, I can correctly replace it to whatever character I want it to. I also tried '\r', and '\t'.
So the question is, what am I missing? Where does that special character come from? Is it because of the charser and/or the transfer encoding? If so, what do I have to do to solve the problem and get the original text back.
Any help would be welcome.
Cheers,
Balázs
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要使用 MimeUtility .这是一个例子。
文件'mime'包含编码文本:
更新:
使用番石榴库:
You need to use MimeUtility.Here is an example.
The file 'mime' contains encoded text:
UPDATE:
Using Guava library :
传输编码“quoted-printable”禁止编码行长度超过 76 个字符。如果要编码的文本包含较长的文本行,则必须插入“软换行符”,这由单个“=”表示为编码行的最后一个字符。这意味着插入以下换行符只是为了满足 76 个字符的限制,并且在解码传输编码时应删除以下换行符。
The transfer encoding "quoted-printable" forbids encoded lines to exceed a length of 76 characters. If the text to be encoded contains longer text lines, a "soft line break" has to be inserted, which is indicated by a single '=' as the last character of an encoded line. It means that the following line break is only inserted to fulfill the 76 character restriction and that the following line break should be removed when decoding the transfer encoding.