Java特殊字符替换

发布于 2024-10-02 06:18:59 字数 1105 浏览 1 评论 0原文

我有一段文字： ” Csuklási roham gyötörheti a svédeket, annyit emlegetikmostanság ismét a svéd modellt Magyarországon.”

在该原始文本中根本没有换行符。

当我通过电子邮件发送此文本（使用 gmail）时，我将其编码为以下内容：

Content-Type: text/plain; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable

Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket, annyit emlegetik mostans=E1g =
ism=E9t a
sv=E9d modellt Magyarorsz=E1gon.

在 HTML 中：

Content-Type: text/html; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable


<span class=3D"Apple-style-span" style=3D"font-family: Helvetica, Verdana, = sans-serif; font-size: 15px; ">Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket= , annyit emlegetik mostans=E1g ism=E9t a sv=E9d modellt Magyarorsz=E1gon.

。 ...

当我尝试将电子邮件正文解析为文本/纯文本时，我无法摆脱“mostans=E1g =”中的 = 符号 ism=E9t”在两个单词之间。请注意，HTML 编码消息中缺少相同的字符。我不知道该特殊字符可能是什么，但我需要消除它才能恢复原始文本。

我尝试替换“\n”，但不是那个，如果我在文本中按“Enter”，我可以正确地将其替换为我想要的任何字符，我也尝试过“\r”和“\t”。

所以问题是，我缺少什么？这个特殊字符是从哪里来的？如果是这样，我需要做什么来解决问题并恢复原始文本？。

欢迎任何

帮助巴拉兹

原文

I have a text:
"
Csuklási roham gyötörheti a svédeket, annyit emlegetik mostanság ismét a svéd modellt Magyarországon."

In that original text there are no line breaks at all.

When I email this text (with gmail), I get it encoded as the following:

Content-Type: text/plain; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable

Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket, annyit emlegetik mostans=E1g =
ism=E9t a
sv=E9d modellt Magyarorsz=E1gon.

In HTML:

Content-Type: text/html; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable


<span class=3D"Apple-style-span" style=3D"font-family: Helvetica, Verdana, = sans-serif; font-size: 15px; ">Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket= , annyit emlegetik mostans=E1g ism=E9t a sv=E9d modellt Magyarorsz=E1gon.

....

When I try to parse the email body as text/plain, I cannot get rid of the = sign in "mostans=E1g =
ism=E9t" between the two words. Note that the same character is missing from the HTML encoded message. I don't have any idea what that special character might be, but I need to eliminate it to get back the original text.

I tried to replace '\n' but it's not that one, if I hit 'Enter' in the text, I can correctly replace it to whatever character I want it to. I also tried '\r', and '\t'.

So the question is, what am I missing? Where does that special character come from? Is it because of the charser and/or the transfer encoding? If so, what do I have to do to solve the problem and get the original text back.

Any help would be welcome.

Cheers,
Balázs

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无悔心 2024-10-09 06:18:59

您需要使用 MimeUtility .这是一个例子。

public class Mime {
    public static void main(String[] args) throws MessagingException,
            IOException {
        InputStream stringStream = new FileInputStream("mime");
        InputStream output = MimeUtility.decode(stringStream,
                "quoted-printable");
        System.out.println(convertStreamToString(output));
    }

    public static String convertStreamToString(InputStream is)
            throws IOException {
        /*
         * To convert the InputStream to String we use the Reader.read(char[]
         * buffer) method. We iterate until the Reader return -1 which means
         * there's no more data to read. We use the StringWriter class to
         * produce the string.
         */
        if (is != null) {
            Writer writer = new StringWriter();

            char[] buffer = new char[1024];
            try {
                Reader reader = new BufferedReader(new InputStreamReader(is,
                        "ISO8859_1"));
                int n;
                while ((n = reader.read(buffer)) != -1) {
                    writer.write(buffer, 0, n);
                }
            } finally {
                is.close();
            }
            return writer.toString();
        } else {
            return "";
        }
    }
}

文件'mime'包含编码文本：

Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket, annyit emlegetik mostans=E1g =
ism=E9t a
sv=E9d modellt Magyarorsz=E1gon.

更新：

使用番石榴库：

    InputSupplier<InputStream> supplier = new InputSupplier<InputStream>() {
        @Override
        public InputStream getInput() throws IOException {
            InputStream inStream = new FileInputStream("mime");
            InputStream decodedStream=null;
            try {
                decodedStream = MimeUtility.decode(inStream,
                "quoted-printable");
            } catch (MessagingException e) {
                e.printStackTrace();
            }
            return decodedStream;
        }
    };
    InputSupplier<InputStreamReader> result = CharStreams
    .newReaderSupplier(supplier, Charsets.ISO_8859_1);
    String ans = CharStreams.toString(result);
    System.out.println(ans);

You need to use MimeUtility.Here is an example.

public class Mime {
    public static void main(String[] args) throws MessagingException,
            IOException {
        InputStream stringStream = new FileInputStream("mime");
        InputStream output = MimeUtility.decode(stringStream,
                "quoted-printable");
        System.out.println(convertStreamToString(output));
    }

    public static String convertStreamToString(InputStream is)
            throws IOException {
        /*
         * To convert the InputStream to String we use the Reader.read(char[]
         * buffer) method. We iterate until the Reader return -1 which means
         * there's no more data to read. We use the StringWriter class to
         * produce the string.
         */
        if (is != null) {
            Writer writer = new StringWriter();

            char[] buffer = new char[1024];
            try {
                Reader reader = new BufferedReader(new InputStreamReader(is,
                        "ISO8859_1"));
                int n;
                while ((n = reader.read(buffer)) != -1) {
                    writer.write(buffer, 0, n);
                }
            } finally {
                is.close();
            }
            return writer.toString();
        } else {
            return "";
        }
    }
}

The file 'mime' contains encoded text:

Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket, annyit emlegetik mostans=E1g =
ism=E9t a
sv=E9d modellt Magyarorsz=E1gon.

UPDATE:

Using Guava library :

    InputSupplier<InputStream> supplier = new InputSupplier<InputStream>() {
        @Override
        public InputStream getInput() throws IOException {
            InputStream inStream = new FileInputStream("mime");
            InputStream decodedStream=null;
            try {
                decodedStream = MimeUtility.decode(inStream,
                "quoted-printable");
            } catch (MessagingException e) {
                e.printStackTrace();
            }
            return decodedStream;
        }
    };
    InputSupplier<InputStreamReader> result = CharStreams
    .newReaderSupplier(supplier, Charsets.ISO_8859_1);
    String ans = CharStreams.toString(result);
    System.out.println(ans);

回复收藏 0 原文