Java中一个字符是1字节还是2字节？

发布于 2024-11-06 23:11:59 字数 1002 浏览 0 评论 0原文

我认为java中的字符是16位，如java doc中建议的那样。字符串不也是这样吗？我有一个将对象存储到文件中的代码：

public static void storeNormalObj(File outFile, Object obj) {
    FileOutputStream fos = null;
    ObjectOutputStream oos = null;
    try {
        fos = new FileOutputStream(outFile);
        oos = new ObjectOutputStream(fos);
        oos.writeObject(obj);
        oos.flush();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            oos.close();
            try {
                fos.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

基本上，当我打开 时，我尝试将字符串 "abcd" 存储到文件 "output" 中用编辑器>output，删除掉无字符串部分，只剩下字符串“abcd”，总共4个字节。有人知道为什么吗？对于ASCII可以支持的字符串，java是否会自动使用ASCII而不是UNICODE来节省空间？谢谢

原文

I thought characters in java are 16 bits as suggested in java doc. Isn't it the case for strings? I have a code that stores an object into a file:

public static void storeNormalObj(File outFile, Object obj) {
    FileOutputStream fos = null;
    ObjectOutputStream oos = null;
    try {
        fos = new FileOutputStream(outFile);
        oos = new ObjectOutputStream(fos);
        oos.writeObject(obj);
        oos.flush();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            oos.close();
            try {
                fos.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Basically, I tried to store an string "abcd" in to file "output", when I opened up output with an editor and deleted the none string part, what's left is just the string "abcd", which is 4 bytes in total. Anyone knows why? Does java automatically saves space by using ASCII instead of UNICODE for Strings that can be supported by ASCII? Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

星 2024-11-13 23:11:59

（我认为“无字符串部分”是指创建 ObjectOutputStream 时发出的字节。您可能不想使用 ObjectOutputStream，但我不知道您的要求。）

仅供参考，Unicode 和UTF-8 不是一回事。 Unicode 是一个标准，除其他外，还指定了哪些字符可用。 UTF-8 是一种字符编码，指定如何将这些字符物理编码为 1 和 0。 UTF-8 可以使用 1 个字节表示 ASCII (<= 127)，最多使用 4 个字节表示其他 Unicode 字符。

UTF-8 是 ASCII 的严格超集。因此，即使您为文件指定 UTF-8 编码并向其写入“abcd”，它也只会包含这四个字节：它们在 ASCII 中具有与 UTF-8 中相同的物理编码。

您的方法使用 ObjectOutputStream，它实际上具有与 ASCII 或 UTF-8 显着不同的编码！如果您仔细阅读 Javadoc，如果 obj 是一个字符串并且已经出现在流中，则对 writeObject 的后续调用将导致发出对前一个字符串的引用，在重复字符串的情况下，可能会导致写入的字节数减少很多。

如果您真的想了解这一点，您确实应该花大量时间阅读有关 Unicode 和字符编码系统的内容。维基百科有一篇关于 Unicode 的优秀文章作为开始。

回复收藏 0 原文

月棠 2024-11-13 23:11:59

是的，char 只是 Java 运行时环境上下文中的 Unicode。如果您希望使用 16 位编码来编写它，请使用 FileWriter。

    FileWriter outputStream = null;

    try {
        outputStream = new FileWriter("myfilename.dat");

        int c;
        while ((c = inputStream.read()) != -1) {
            outputStream.write(c);
        }
    } finally {
        if (outputStream != null) {
            outputStream.close();
        }
    }

Yea, the char is only Unicode within the context of the Java runtime environment. If you wish to write it using 16-bit encoding, use a FileWriter.

    FileWriter outputStream = null;

    try {
        outputStream = new FileWriter("myfilename.dat");

        int c;
        while ((c = inputStream.read()) != -1) {
            outputStream.write(c);
        }
    } finally {
        if (outputStream != null) {
            outputStream.close();
        }
    }

回复收藏 0 原文

寄风 2024-11-13 23:11:59

如果您查看 String 的源代码，会注意到它调用 DataOutput.writeUTF 来写入字符串。如果您读过，您会发现它们被写为“modified UTF-8”。细节很长，但是如果你不使用非7位ascii，是的，它将占用一个字节。如果您想要详细信息，请查看 DataOutput.writeUTF() 中的超长 javadoc

回复收藏 0 原文