win7中ansi编码是gbX?

发布于 2021-11-12 23:12:04 字数 699 浏览 946 评论 6

最近看到一些关于字符编码的文章，对于这些常见的字符集、字符编码有了一些简单的认识，但还是有些疑问。我们知道，在windows（windows7）中记事本默认保存的编码格式是ANSI，对于中文操作系统它是采用GB2312，GBK还是GB18130。？

还有就是Java是不是不支持ANSI衍生出来的（GB2312、GBK、GB18130）这些字符集，我看Charset类的doc文档中是这么说的。

Charset

Description

US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1 UTF-8 Eight-bit UCS Transformation Format UTF-16BE Sixteen-bit UCS Transformation Format, big-endian byte order UTF-16LE Sixteen-bit UCS Transformation Format, little-endian byte order UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

居里长安 2021-11-13 14:57:19

从网上找到一个判断文件字符编码的代码，可以判断四种类型GBK，UTF-16LE，UTF-16BE,UTF-8,分别对应着记事本可另存为的编码类型ANSI,Unicode,Unicode big endian,UTF-8,这样以后再Java里无乱码读写文件，如打开一个文件new InputStreamReader(new FileInputStream(filepath), FileCharset.get_charset(filepath)).

public class FileCharset {
	
	public static String get_charset(String filepath) {
		return get_charset(new File(filepath));
	}
	
	public static String get_charset(File file) {
		String charset = "GBK";
		byte[] first3Bytes = new byte[3];
		try {
			boolean checked = false;
			BufferedInputStream bis = new BufferedInputStream(
					new FileInputStream(file));
			bis.mark(0);
			int read = bis.read(first3Bytes, 0, 3);
			if (read == -1)
				return charset;
			if (first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE) {
				charset = "UTF-16LE";
				checked = true;
			} else if (first3Bytes[0] == (byte) 0xFE
					&& first3Bytes[1] == (byte) 0xFF) {
				charset = "UTF-16BE";
				checked = true;
			} else if (first3Bytes[0] == (byte) 0xEF
					&& first3Bytes[1] == (byte) 0xBB
					&& first3Bytes[2] == (byte) 0xBF) {
				charset = "UTF-8";
				checked = true;
			}
			bis.reset();
			if (!checked) {
				// int len = 0;
				int loc = 0;

				while ((read = bis.read()) != -1) {
					loc++;
					if (read >= 0xF0)
						break;
					if (0x80 <= read && read <= 0xBF) // 单独出现BF以下的，也算是GBK
						break;
					if (0xC0 <= read && read <= 0xDF) {
						read = bis.read();
						if (0x80 <= read && read <= 0xBF) // 双字节 (0xC0 - 0xDF)
							// (0x80
							// - 0xBF),也可能在GB编码内
							continue;
						else
							break;
					} else if (0xE0 <= read && read <= 0xEF) {// 也有可能出错，但是几率较小
						read = bis.read();
						if (0x80 <= read && read <= 0xBF) {
							read = bis.read();
							if (0x80 <= read && read <= 0xBF) {
								charset = "UTF-8";
								break;
							} else
								break;
						} else
							break;
					}
				}
				// System.out.println( loc + " " + Integer.toHexString( read )
				// );
			}

			bis.close();
		} catch (Exception e) {
			e.printStackTrace();
		}

		return charset;
	}

	public static void main(String[] args) {
		System.out.println(FileCharset.get_charset(new File("d:/js.js")));
	}
}

回复收藏 0

路还长，别太狂 2021-11-13 14:56:45

”Java是不是不支持ANSI衍生出来的（GB2312、GBK、GB18130）这些字符集？“，你根本不需要有这些想法，你的这个想法也很初级哦，如果你稍微有点基础的话，你就会知道
Java支持的是Unicode编码，该编码囊括了世界上所有的字符集，所以Java支持所有的字符集编码！不知道楼主听明白了吗？

回复收藏 0