Java特殊字符替换

发布于 2024-10-02 06:18:59 字数 1105 浏览 1 评论 0原文

我有一段文字: ” Csuklási roham gyötörheti a svédeket, annyit emlegetikmostanság ismét a svéd modellt Magyarországon.”

在该原始文本中根本没有换行符。

当我通过电子邮件发送此文本(使用 gmail)时,我将其编码为以下内容:

Content-Type: text/plain; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable

Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket, annyit emlegetik mostans=E1g =
ism=E9t a
sv=E9d modellt Magyarorsz=E1gon. 

在 HTML 中:

Content-Type: text/html; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable


<span class=3D"Apple-style-span" style=3D"font-family: Helvetica, Verdana, = sans-serif; font-size: 15px; ">Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket= , annyit emlegetik mostans=E1g ism=E9t a sv=E9d modellt Magyarorsz=E1gon.

。 ...

当我尝试将电子邮件正文解析为文本/纯文本时,我无法摆脱“mostans=E1g =”中的 = 符号 ism=E9t”在两个单词之间。请注意,HTML 编码消息中缺少相同的字符。我不知道该特殊字符可能是什么,但我需要消除它才能恢复原始文本。

我尝试替换“\n”,但不是那个,如果我在文本中按“Enter”,我可以正确地将其替换为我想要的任何字符,我也尝试过“\r”和“\t”。

所以问题是,我缺少什么?这个特殊字符是从哪里来的?如果是这样,我需要做什么来解决问题并恢复原始文本? 。

欢迎任何

帮助 巴拉兹

I have a text:
"
Csuklási roham gyötörheti a svédeket, annyit emlegetik mostanság ismét a svéd modellt Magyarországon."

In that original text there are no line breaks at all.

When I email this text (with gmail), I get it encoded as the following:

Content-Type: text/plain; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable

Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket, annyit emlegetik mostans=E1g =
ism=E9t a
sv=E9d modellt Magyarorsz=E1gon. 

In HTML:

Content-Type: text/html; charset=ISO-8859-2
Content-Transfer-Encoding: quoted-printable


<span class=3D"Apple-style-span" style=3D"font-family: Helvetica, Verdana, = sans-serif; font-size: 15px; ">Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket= , annyit emlegetik mostans=E1g ism=E9t a sv=E9d modellt Magyarorsz=E1gon.

....

When I try to parse the email body as text/plain, I cannot get rid of the = sign in "mostans=E1g =
ism=E9t" between the two words. Note that the same character is missing from the HTML encoded message. I don't have any idea what that special character might be, but I need to eliminate it to get back the original text.

I tried to replace '\n' but it's not that one, if I hit 'Enter' in the text, I can correctly replace it to whatever character I want it to. I also tried '\r', and '\t'.

So the question is, what am I missing? Where does that special character come from? Is it because of the charser and/or the transfer encoding? If so, what do I have to do to solve the problem and get the original text back.

Any help would be welcome.

Cheers,
Balázs

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

无悔心 2024-10-09 06:18:59

您需要使用 MimeUtility .这是一个例子。

public class Mime {
    public static void main(String[] args) throws MessagingException,
            IOException {
        InputStream stringStream = new FileInputStream("mime");
        InputStream output = MimeUtility.decode(stringStream,
                "quoted-printable");
        System.out.println(convertStreamToString(output));
    }

    public static String convertStreamToString(InputStream is)
            throws IOException {
        /*
         * To convert the InputStream to String we use the Reader.read(char[]
         * buffer) method. We iterate until the Reader return -1 which means
         * there's no more data to read. We use the StringWriter class to
         * produce the string.
         */
        if (is != null) {
            Writer writer = new StringWriter();

            char[] buffer = new char[1024];
            try {
                Reader reader = new BufferedReader(new InputStreamReader(is,
                        "ISO8859_1"));
                int n;
                while ((n = reader.read(buffer)) != -1) {
                    writer.write(buffer, 0, n);
                }
            } finally {
                is.close();
            }
            return writer.toString();
        } else {
            return "";
        }
    }
}

文件'mime'包含编码文本:

Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket, annyit emlegetik mostans=E1g =
ism=E9t a
sv=E9d modellt Magyarorsz=E1gon.

更新:

使用番石榴库:

    InputSupplier<InputStream> supplier = new InputSupplier<InputStream>() {
        @Override
        public InputStream getInput() throws IOException {
            InputStream inStream = new FileInputStream("mime");
            InputStream decodedStream=null;
            try {
                decodedStream = MimeUtility.decode(inStream,
                "quoted-printable");
            } catch (MessagingException e) {
                e.printStackTrace();
            }
            return decodedStream;
        }
    };
    InputSupplier<InputStreamReader> result = CharStreams
    .newReaderSupplier(supplier, Charsets.ISO_8859_1);
    String ans = CharStreams.toString(result);
    System.out.println(ans);

You need to use MimeUtility.Here is an example.

public class Mime {
    public static void main(String[] args) throws MessagingException,
            IOException {
        InputStream stringStream = new FileInputStream("mime");
        InputStream output = MimeUtility.decode(stringStream,
                "quoted-printable");
        System.out.println(convertStreamToString(output));
    }

    public static String convertStreamToString(InputStream is)
            throws IOException {
        /*
         * To convert the InputStream to String we use the Reader.read(char[]
         * buffer) method. We iterate until the Reader return -1 which means
         * there's no more data to read. We use the StringWriter class to
         * produce the string.
         */
        if (is != null) {
            Writer writer = new StringWriter();

            char[] buffer = new char[1024];
            try {
                Reader reader = new BufferedReader(new InputStreamReader(is,
                        "ISO8859_1"));
                int n;
                while ((n = reader.read(buffer)) != -1) {
                    writer.write(buffer, 0, n);
                }
            } finally {
                is.close();
            }
            return writer.toString();
        } else {
            return "";
        }
    }
}

The file 'mime' contains encoded text:

Csukl=E1si roham gy=F6t=F6rheti a sv=E9deket, annyit emlegetik mostans=E1g =
ism=E9t a
sv=E9d modellt Magyarorsz=E1gon.

UPDATE:

Using Guava library :

    InputSupplier<InputStream> supplier = new InputSupplier<InputStream>() {
        @Override
        public InputStream getInput() throws IOException {
            InputStream inStream = new FileInputStream("mime");
            InputStream decodedStream=null;
            try {
                decodedStream = MimeUtility.decode(inStream,
                "quoted-printable");
            } catch (MessagingException e) {
                e.printStackTrace();
            }
            return decodedStream;
        }
    };
    InputSupplier<InputStreamReader> result = CharStreams
    .newReaderSupplier(supplier, Charsets.ISO_8859_1);
    String ans = CharStreams.toString(result);
    System.out.println(ans);
海风掠过北极光 2024-10-09 06:18:59

传输编码“quoted-printable”禁止编码行长度超过 76 个字符。如果要编码的文本包含较长的文本行,则必须插入“软换行符”,这由单个“=”表示为编码行的最后一个字符。这意味着插入以下换行符只是为了满足 76 个字符的限制,并且在解码传输编码时应删除以下换行符。

The transfer encoding "quoted-printable" forbids encoded lines to exceed a length of 76 characters. If the text to be encoded contains longer text lines, a "soft line break" has to be inserted, which is indicated by a single '=' as the last character of an encoded line. It means that the following line break is only inserted to fulfill the 76 character restriction and that the following line break should be removed when decoding the transfer encoding.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文