什么是 ANSI 格式?
什么是 ANSI 编码格式? 是系统默认格式吗? 它与 ASCII 有何不同?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
什么是 ANSI 编码格式? 是系统默认格式吗? 它与 ASCII 有何不同?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(11)
ANSI 编码是一个稍微通用的术语,用于指代系统(通常是 Windows)上的标准代码页。 在西方/美国系统上,更正确的说法是 Windows-1252。 (它可以代表其他系统上的某些其他 Windows 代码页。)这本质上是一个 ASCII 字符集的扩展,因为它包含所有 ASCII 字符以及附加的 128 个字符代码。 这种差异是由于“ANSI”编码是 8 位而不是 ASCII 的 7 位(现在 ASCII 几乎总是被编码为 8 位字节,MSB 设置为 0)。 请参阅这篇文章,了解为什么这种编码通常称为 ANSI。
“ANSI”这个名称用词不当,因为它不符合任何实际的 ANSI 标准,但这个名称一直沿用至今。 ANSI 与 UTF-8 不同。
ANSI encoding is a slightly generic term used to refer to the standard code page on a system, usually Windows. It is more properly referred to as Windows-1252 on Western/U.S. systems. (It can represent certain other Windows code pages on other systems.) This is essentially an extension of the ASCII character set in that it includes all the ASCII characters with an additional 128 character codes. This difference is due to the fact that "ANSI" encoding is 8-bit rather than 7-bit as ASCII is (ASCII is almost always encoded nowadays as 8-bit bytes with the MSB set to 0). See the article for an explanation of why this encoding is usually referred to as ANSI.
The name "ANSI" is a misnomer, since it doesn't correspond to any actual ANSI standard, but the name has stuck. ANSI is not the same as UTF-8.
从技术上讲,ANSI 应该与 US-ASCII 相同。 它指的是 ANSI X3.4 标准,即 ANSI 组织批准的 ASCII 版本。 ASCII/ANSI 中未定义最高位集字符的使用,因为它是 7 位字符集。
然而,DOS 和随后的 Windows 社区多年来对该术语的误用已经使其实际含义不再是“正在使用的任何机器的系统代码页”。 系统代码页有时也称为“mbcs”,因为在东亚系统上,它可以是每个字符多字节编码。 有些代码页甚至可以使用顶部位清除字节作为多字节序列中的尾随字节,因此它甚至不与纯 ASCII 严格兼容......但即使如此,它仍然被称为“ANSI”。
在美国和西欧默认设置中,“ANSI”映射到 Windows 代码页 1252。这与 ISO-8859-1 不同(尽管非常相似)。 在其他机器上,它可能是其他任何东西。 这使得“ANSI”作为外部编码标识符完全没有用处。
Technically, ANSI should be the same as US-ASCII. It refers to the ANSI X3.4 standard, which is simply the ANSI organisation's ratified version of ASCII. Use of the top-bit-set characters is not defined in ASCII/ANSI as it is a 7-bit character set.
However years of misuse of the term by the DOS and subsequently Windows community has left its practical meaning as “the system codepage of whatever machine is being used”. The system codepage is also sometimes known as ‘mbcs’, since on East Asian systems that can be a multiple-byte-per-character encoding. Some code pages can even use top-bit-clear bytes as trailing bytes in a multibyte sequence, so it's not even strict compatible with plain ASCII... but even then, it's still called “ANSI”.
On US and Western European default settings, “ANSI” maps to Windows code page 1252. This is not the same as ISO-8859-1 (although it is quite similar). On other machines it could be anything else at all. This makes “ANSI” utterly useless as an external encoding identifier.
严格来说,不存在 ANSI 编码这样的东西。 通俗地说,术语 ANSI 用于几种不同的编码:
Strictly speaking, there is no such thing as ANSI encoding. Colloquially the term ANSI is used for several different encodings:
曾几何时,微软和其他人一样使用 7 位字符集,并且在适合他们的时候发明了自己的字符集,尽管他们保留了 ASCII 作为核心子集。 然后他们意识到世界已经转向 8 位编码,并且存在国际标准,例如 ISO-8859 系列。 在那些日子里,如果你想获得一个国际标准并且你住在美国,你可以从美国国家标准协会(ANSI)那里购买它,该协会用自己的品牌和编号重新发布了国际标准(这是因为美国政府希望符合美国标准,而非国际标准)。 因此,微软的 ISO-8859 副本的封面上写着“ANSI”。 而且由于当时 Microsoft 还不太习惯标准,因此他们没有意识到 ANSI 还发布了许多其他标准。 因此,他们通过封面上的名称“ANSI”引用了 ISO-8859 系列中的标准(以及他们发明的变体,因为他们当时并没有真正理解标准),并且它进入了 Microsoft用户文档,从而进入用户社区。 那是大约 30 年前的事了,但今天你仍然有时会听到这个名字。
Once upon a time Microsoft, like everyone else, used 7-bit character sets, and they invented their own when it suited them, though they kept ASCII as a core subset. Then they realised the world had moved on to 8-bit encodings and that there were international standards around, such as the ISO-8859 family. In those days, if you wanted to get hold of an international standard and you lived in the US, you bought it from the American National Standards Institute, ANSI, who republished international standards with their own branding and numbers (that's because the US government wants conformance to American standards, not international standards). So Microsoft's copy of ISO-8859 said "ANSI" on the cover. And because Microsoft weren't very used to standards in those days, they didn't realise that ANSI published lots of other standards as well. So they referred to the standards in the ISO-8859 family (and the variants that they invented, because they didn't really understand standards in those days) by the name on the cover, "ANSI", and it found its way into Microsoft user documentation and hence into the user community. That was about 30 years ago, but you still sometimes hear the name today.
ASCII 仅定义了包含 128 个符号的 7 位代码页。 ANSI 将其扩展为 8 位,并且符号 128 到 255 有多个不同的代码页。
命名 ANSI 是不正确的,因为它实际上是定义此代码页的 ISO/IEC 8859 规范。 请参阅 ISO/IEC 8859 以供参考。 ISO/IEC 8859-1 到 ISO/IEC 8859-16 有 16 个代码页。
Windows-1252 再次基于 ISO/IEC 8859-1,主要在以下方面进行了一些修改C1 控制集的范围在 128 到 159 之间。维基百科指出 Windows-1252 也是称为 ISO-8859-1,在 ISO 和 8859 之间有第二个连字符。(难以置信!谁做了这样的事?!?)
ASCII just defines a 7 bit code page with 128 symbols. ANSI extends this to 8 bit and there are several different code pages for the symbols 128 to 255.
The naming ANSI is not correct because it is actually the ISO/IEC 8859 norm that defines this code pages. See ISO/IEC 8859 for reference. There are 16 code pages ISO/IEC 8859-1 to ISO/IEC 8859-16.
Windows-1252 is again based on ISO/IEC 8859-1 with some modification mainly in the range of the C1 control set in the range 128 to 159. Wikipedia states that Windows-1252 is also refered as ISO-8859-1 with a second hyphen between ISO and 8859. (Unbelievable! Who does something like that?!?)
基本上“ANSI”是指 Windows 上的旧代码页。 另请参阅 Raymond Chen 撰写的关于此主题的文章:
大多数代码页中的前 127 个字符与 ASCII 相同,但上面的字符有所不同。
然而,ANSI 不会自动表示 CP1252 或 Latin 1。
尽管存在所有混乱,但现在您应该避免此类问题并使用 Unicode。
Basically "ANSI" refers to the legacy codepage on Windows. See also an article by Raymond Chen on this topic:
The first 127 characters are identical to ASCII in most code pages, the upper characters vary, though.
However, ANSI does not automatically mean CP1252 or Latin 1.
All confusion notwithstanding you should simply avoid such issues nowadays and use Unicode.
如果您的电脑不是“西方”电脑并且您不知道使用哪个代码页,您可以查看此页面:国家语言支持 (NLS) API 参考
[Microsoft 删除了此参考,从网络存档中获取它 国家语言支持 (NLS) API 参考
或者您可以查询您的注册表:
Just in case your PC is not a "Western" PC and you don't know which code page is used, you can have a look at this page: National Language Support (NLS) API Reference
[Microsoft removed this reference, take it form web-archive National Language Support (NLS) API Reference
Or you can query your registry:
使用单字节字符时,ASCII 格式定义前 127 个字符。 128-255 的扩展字符由各种 ANSI 代码页定义,以允许对其他语言的有限支持。 为了理解 ANSI 编码的字符串,您需要知道它使用哪个代码页。
When using single-byte characters, the ASCII format defines the first 127 characters. The extended characters from 128-255 are defined by various ANSI code pages to allow limited support for other languages. In order to make sense of an ANSI encoded string, you need to know which code page it uses.
我记得当“ANSI”文本引用伪 VT-100 转义码时,可通过 ANSI.SYS 驱动程序在 DOS 中使用它来改变流文本的流程....可能不是您所指的内容,但如果是的话,请参阅 http://en.wikipedia.org/wiki/ANSI_escape_code
I remember when "ANSI" text referred to the pseudo VT-100 escape codes usable in DOS through the ANSI.SYS driver to alter the flow of streaming text.... Probably not what you are referring to but if it is see http://en.wikipedia.org/wiki/ANSI_escape_code
ANSI 对于 Windows-12 来说确实是一个非常不精确的术语..,因为在我的计算机上,Windows-1250 在 Notpad+ 中显示为“ANSI”。 我更喜欢透明的 Windows-1250(或 CP1250)。 所以它可能意味着 1252(西欧)或 1250(东欧),天知道还有什么。 当我看到 ANSI 时,我总是用其他工具检查编码。
ANSI really is a very imprecise term for Windows-12.., because on my computer, Windows-1250 is shown as "ANSI" in Notpad+. I'd prefer clear Windows-1250 (or CP1250) instead. So it can mean 1252 (Western) or 1250 (Eastern Europe) and god know what else. When I see ANSI, I always check the encoding with some other tool instead.
ANSI(又名 Windows-1252/WinLatin1)是拉丁字母的字符编码,与 ISO-8859-1。
您可能想在维基百科上查看。
ANSI (aka Windows-1252/WinLatin1) is a character encoding of the Latin alphabet, fairly similar to ISO-8859-1.
You may want to take a look of it at Wikipedia.