Java中的StringBufferInputStream问题

发布于 2024-10-08 20:54:27 字数 1485 浏览 5 评论 0原文

我想读取输入字符串并将其作为 UTF8 编码字符串返回。所以我在 Oracle/Sun 网站上找到了一个使用 FileInputStream 的示例。我不想读取文件,而是读取字符串,因此我将其更改为 StringBufferInputStream 并使用下面的代码。方法参数jtext,是一些日语文本。 实际上这个方法效果很好。问题是关于已弃用的代码。我不得不添加 @SuppressWarnings 因为 StringBufferInputStream 已被弃用。我想知道有没有更好的方法来获取字符串输入流?就保持原样就可以了吗?我花了很长时间试图解决这个问题,以至于我不想改变任何东西,现在我似乎已经解决了它。

            @SuppressWarnings("deprecation")
    private  String readInput(String jtext) {

        StringBuffer buffer = new StringBuffer();
        try {
        StringBufferInputStream  sbis = new StringBufferInputStream (jtext);
        InputStreamReader isr = new InputStreamReader(sbis,
                                  "UTF8");
        Reader in = new BufferedReader(isr);
        int ch;
        while ((ch = in.read()) > -1) {
            buffer.append((char)ch);
        }

        in.close();
        return buffer.toString();
        } catch (IOException e) {
        e.printStackTrace();
        return null;
        }
    }

我想我找到了一种解决方案:

private  String readInput(String jtext) {

        String n;
        try {
            n = new String(jtext.getBytes("8859_1"));
            return n;
        } catch (UnsupportedEncodingException e) {

            return null;
        }
                    }

在我拼命使用 getBytes(UTF8) 之前。但我偶然使用了 Latin-1“8859_1”并且它起作用了。为什么它有效,我无法理解。这就是我一步一步所做的:

OpenOffice CSV(utf8)----->SQLite(utf8,显然)----->java 编码为 Latin-1,以某种方式可读。

I want to read an input string and return it as a UTF8 encoded string. SO I found an example on the Oracle/Sun website that used FileInputStream. I didn't want to read a file, but a string, so I changed it to StringBufferInputStream and used the code below. The method parameter jtext, is some Japanese text. Actually this method works great. The question is about the deprecated code. I had to put @SuppressWarnings because StringBufferInputStream is deprecated. I want to know is there a better way to get a string input stream? Is it ok just to leave it as is? I've spent so long trying to fix this problem that I don't want to change anything now I seem to have cracked it.

            @SuppressWarnings("deprecation")
    private  String readInput(String jtext) {

        StringBuffer buffer = new StringBuffer();
        try {
        StringBufferInputStream  sbis = new StringBufferInputStream (jtext);
        InputStreamReader isr = new InputStreamReader(sbis,
                                  "UTF8");
        Reader in = new BufferedReader(isr);
        int ch;
        while ((ch = in.read()) > -1) {
            buffer.append((char)ch);
        }

        in.close();
        return buffer.toString();
        } catch (IOException e) {
        e.printStackTrace();
        return null;
        }
    }

I think I found a solution - of sorts:

private  String readInput(String jtext) {

        String n;
        try {
            n = new String(jtext.getBytes("8859_1"));
            return n;
        } catch (UnsupportedEncodingException e) {

            return null;
        }
                    }

Before I was desparately using getBytes(UTF8). But I by chance I used Latin-1 "8859_1" and it worked. Why it worked, I can't fathom. This is what I did step-by-step:

OpenOffice CSV(utf8)------>SQLite(utf8, apparently)------->java encoded as Latin-1, somehow readable.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

小巷里的女流氓 2024-10-15 20:54:27

StringBufferInputStream 被弃用的原因是因为它从根本上被破坏了......除了完全由 Latin-1 字符组成的字符串之外的任何内容。根据 javadoc,它通过简单地砍掉前 8 位来“编码”字符!如果您的应用程序需要正确处理 Unicode 等,您不想使用它。

如果您想从 String 创建一个 InputStream,那么正确的方法是使用 String.getBytes(...)String 转换为字节数组,然后将其包装在 ByteArrayInputStream 中。 (确保选择适当的编码!)。

但是您的示例应用程序立即获取 InputStream,将其转换为 Reader,然后添加 BufferedReader 如果这是您的真正目标,那么一个更简单的方法更有效的方法很简单:

Reader in = new StringReader(text);

这避免了不必要的字符串编码和解码,也避免了“缓冲”层在这种情况下没有任何用处。

(如果您在文件、网络或控制台流上执行小型 I/O 操作,缓冲流比无缓冲流效率更高。但是对于从内存数据结构提供服务的流,好处要小得多,甚至可能是负面的。)

后续

我意识到你现在正在尝试做什么......解决字符编码/解码问题。

我的建议是尝试明确地找出数据库传送的字符数据的实际编码,然后确保 JDBC 驱动程序配置为使用相同的编码。尝试通过使用一种编码进行编码并使用另一种编码进行解码来消除错误翻译是不可靠的,并且只能部分纠正问题。

您还需要考虑字符在进入数据库的过程中被破坏的可能性。如果是这种情况,那么您可能无法整理它们。

The reason that StringBufferInputStream is deprecated is because it is fundamentally broken ... for anything other than Strings consisting entirely of Latin-1 characters. According to the javadoc it "encodes" characters by simply chopping off the top 8 bits! You don't want to use it if your application needs to handle Unicode, etc correctly.

If you want to create an InputStream from a String, then the correct way to do it is to use String.getBytes(...) to turn the String into a byte array, and then wrap that in a ByteArrayInputStream. (Make sure that you choose an appropriate encoding!).

But your sample application immediately takes the InputStream, converts it to a Reader and then adds a BufferedReader If this is your real aim, then a simpler and more efficient approach is simply this:

Reader in = new StringReader(text);

This avoids the unnecessary encoding and decoding of the String, and also the "buffer" layer which serves no useful purpose in this case.

(A buffered stream is much more efficient than an unbuffered stream if you are doing small I/O operations on a file, network or console stream. But for a stream that is served from an in-memory data structure the benefits are much smaller, and possibly even negative.)

FOLLOWUP

I realized what you are trying to do now ... work around a character encoding / decoding issue.

My advice would be to try to figure out definitively the actual encoding of the character data that is being delivered by the database, then make sure that the JDBC drivers are configured to use the same encoding. Trying to undo the mis-translation by encoding with one encoding and decoding with another is dodgy, and can give you only a partial correction of the problems.

You also need to consider the possibility that the characters got mangled on the way into the database. If this is the case, then you may be unable to de-mangle them.

呆橘 2024-10-15 20:54:27

这就是你想做的吗?这是之前对类似问题的回答< /a>.我不知道为什么你想将字符串转换为完全相同的字符串。

Java String 保存一个字符序列,其中每个字符代表一个 Unicode 数字。因此,可以从两个不同的字节序列构造相同的字符串,即一个使用 UTF-8 编码,另一个使用 US-ASCII 编码。

如果您想将其写入文件,您可以随时使用 String.getBytes("encoder");

private static String readInput(String jtext) {
    byte[] bytes = jtext.getBytes();
    try {
        String string = new String(bytes, "UTF-8");
        return string;
    } catch (UnsupportedEncodingException ex) {
        // do something
        return null;
    }
}

Update

进行转换,这是我的假设。

根据您的评论,您的 SQLite DB 使用一种编码存储文本值,如 UTF-16。由于某种原因,您的 SQLite APi 无法确定它使用什么编码将 Unicode 值编码为字节序列。

因此,当您使用 SQLite API 中的 getString 方法时,它会从数据库中读取一组字节,并使用不正确的编码将它们转换为 Java 字符串。如果是这种情况,您应该使用 getBytes 方法并自己重建字符串,即 new String(bytes, "encoding used in your DB"); 如果您的数据库是以 UTF-16 格式存储,则 new String(bytes, "UTF-16"); 应该可读。

更新

我不是在谈论 String 类上的 getBytes 方法。我谈到了 SQL 结果对象上的 getBytes 方法,例如 result.getBytes(String columnLabel)

ResultSet result = .... // from SQL query
String readableString = readInput(result.getBytes("my_table_column"));

您需要将 readInput 方法的签名更改为

private static String readInput(byte[] bytes) {
    try {
        // change encoding to your DB encoding.
        // this can be UTF-8, UTF-16, 8859_1, etc.
        String string = new String(bytes, "UTF-8");
        return string;
    } catch (UnsupportedEncodingException ex) {
        // do something, at least return garbled text
        return new String(bytes, "UTF-8");;
    }
}

您在此处设置的任何编码,以使您的字符串可读,它绝对是数据库中列的编码。这不涉及无法解释的现象,并且您确切地知道您的列编码是什么。

但最好将 JDBC 驱动程序配置为使用正确的编码,这样您就不需要使用此 readInput 方法进行转换。

如果没有编码可以使您的字符串可读,那么您将需要考虑在将其写入数据库时​​字符被破坏的可能性,如 @Stephen C 所说。如果是这种情况,使用绕行方法可能会导致您在转换过程中丢失一些字符。您还需要解决写入过程中的编码问题。

Is this what you are trying to do? Here is previous answer on similar question. I am not sure why you want to convert to a String to an exactly the same String.

Java String holds a sequence of chars in which each char represents a Unicode number. So it is possible to construct the same string from two different byte sequences, says one is encoded with UTF-8 and the other is encoded with US-ASCII.

If you want to write it to file, you can always convert it with String.getBytes("encoder");

private static String readInput(String jtext) {
    byte[] bytes = jtext.getBytes();
    try {
        String string = new String(bytes, "UTF-8");
        return string;
    } catch (UnsupportedEncodingException ex) {
        // do something
        return null;
    }
}

Update

Here is my assumption.

According to your comment, you SQLite DB store text value using one encoding, says UTF-16. For some reason, your SQLite APi cannot determine what the encoding it uses to encode the Unicode values to sequence of bytes.

So when you use getString method from your SQLite API, it reads a set of bytes form you DB, and convert them into Java String using incorrect encoding. If this is the case, you should use getBytes method and reconstruct the String yourself, i.e. new String(bytes, "encoding used in your DB"); If you DB is stored in UTF-16, then new String(bytes, "UTF-16"); should be readable.

Update

I wasn't talking about getBytes method on String class. I talked about getBytes method on your SQL result object, e.g. result.getBytes(String columnLabel).

ResultSet result = .... // from SQL query
String readableString = readInput(result.getBytes("my_table_column"));

You will need to change the signature of your readInput method to

private static String readInput(byte[] bytes) {
    try {
        // change encoding to your DB encoding.
        // this can be UTF-8, UTF-16, 8859_1, etc.
        String string = new String(bytes, "UTF-8");
        return string;
    } catch (UnsupportedEncodingException ex) {
        // do something, at least return garbled text
        return new String(bytes, "UTF-8");;
    }
}

Whatever encoding you set in here which makes your String readable, it is definitely the encoding of your column in DB. This involves no unexplanable phenomenon and you know exactly what your column encoding is.

But it will be good to config your JDBC driver to use the correct encoding so that you will not need to use this readInput method to convert.

If no encoding can make your string readable, you will need consider the possibility of the characters got mangled when it was written to DB as @Stephen C said. If this is the case, using walk around method may cause you to lose some of the charaters during conversions. You will also need to solve encoding problem during writting as well.

毁梦 2024-10-15 20:54:27

StringReader 类是新的替代方案到已弃用的 StringBufferInputStream 类。

但是,您声明您实际想要做的是获取现有的 String 并将其编码为 UTF-8 返回。我希望你能够更简单地做到这一点。像这样的东西:

s8 = new String(jtext.getBytes("UTF8"));

The StringReader class is the new alternative to the deprecated StringBufferInputStream class.

However, you state that what you actually want to do is take an existing String and return it encoded as UTF-8. You should be able to do that much more simply I expect. Something like:

s8 = new String(jtext.getBytes("UTF8"));
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文