“high ascii”的正确技术术语是什么?人物?
引用“高 ASCII”或“扩展 ASCII”字符的技术上正确的方法是什么?我指的不仅仅是128-255的范围,而是0-127范围之外的任何字符。
它们通常被称为变音符号、重音字母,有时被随意称为“国家”或非英语字符,但这些名称要么不精确,要么只涵盖可能字符的子集。
程序员能够立即认出什么正确、精确的术语?与非技术受众交谈时使用的最佳英语术语是什么?
What is the technically correct way of referring to "high ascii" or "extended ascii" characters? I don't just mean the range of 128-255, but any character beyond the 0-127 scope.
Often they're called diacritics, accented letters, sometimes casually referred to as "national" or non-English characters, but these names are either imprecise or they cover only a subset of the possible characters.
What correct, precise term that will programmers immediately recognize? And what would be the best English term to use when speaking to a non-technical audience?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
“非 ASCII 字符”
"Non-ASCII characters"
127 以上的 ASCII 字符代码未定义。许多不同的设备和软件供应商开发了自己的字符集,值为 128-255。有些人选择绘图符号,有些人选择重音字符,有些人选择其他字符。
Unicode 试图建立一套通用的字符代码,其中包括大多数语言中使用的字符。这不仅包括传统的西方字母,还包括西里尔文、阿拉伯文、希腊文,甚至包括中文、日文、韩文以及许多其他现代和古代语言的字符。
Unicode 有多种实现。最流行的编码之一是 UTF-8。受欢迎的一个主要原因是它向后兼容 ASCII,字符代码 0 到 127 对于 ASCII 和 UTF-8 都是相同的。
这意味着最好说 ASCII 是 UTF-8 的子集。字符代码 128 及以上不是 ASCII。它们可以是 UTF-8(或其他 Unicode),也可以是硬件或软件供应商的自定义实现。
ASCII character codes above 127 are not defined. many differ equipment and software suppliers developed their own character set for the value 128-255. Some chose drawing symbols, sone choose accent characters, other choose other characters.
Unicode is an attempt to make a universal set of character codes which includes the characters used in most languages. This includes not only the traditional western alphabets, but Cyrillic, Arabic, Greek, and even a large set of characters from Chinese, Japanese and Korean, as well as many other language both modern and ancient.
There are several implementations of Unicode. One of the most popular if UTF-8. A major reason for that popularity is that it is backwards compatible with ASCII, character codes 0 to 127 are the same for both ASCII and UTF-8.
That means it is better to say that ASCII is a subset of UTF-8. Characters code 128 and above are not ASCII. They can be UTF-8 (or other Unicode) or they can be a custom implementation by a hardware or software supplier.
您可以创造诸如“trans-ASCII”、“supra-ASCII”、“ultra-ASCII”等术语。实际上,“meta-ASCII”会更好,因为它暗示了元位。
You could coin a term like “trans-ASCII,” “supra-ASCII,” “ultra-ASCII” etc. Actually, “meta-ASCII” would be even nicer since it alludes to the meta bit.
不代表 ASCII 字符的位序列并不一定是 Unicode 字符。
根据您使用的字符编码,它可能是:
一个定义适合所有这些情况的是:
说得非常迂腐,即使是“非 ASCII 字符”也不能完全适合所有这些情况,因为有时超出此范围的位序列可能只是一个无效位序列,而不是一个字符。
A bit sequence that doesn't represent an ASCII character is not definitively a Unicode character.
Depending on the character encoding you're using, it could be either:
The one definition that would fit all of these situations is:
To be highly pedantic, even "a non-ASCII character" wouldn't precisely fit all of these situations, because sometimes a bit sequence outside this range may be simply an invalid bit sequence, and not a character at all.
我使用的术语是“扩展 ASCII”,意思是“超出原始 0-127 的字符”。
Unicode 是一组可能的扩展 ASCII 字符,并且非常非常大。
UTF-8 是表示 Unicode 字符的方式,与原始 ASCII 向后兼容。
"Extended ASCII" is the term I'd use, meaning "characters beyond the original 0-127".
Unicode is one possible set of Extended ASCII characters, and is quite, quite large.
UTF-8 is the way to represent Unicode characters that is backwards-compatible with the original ASCII.
摘自在线资源(酷网站),因为我发现它有用并且适合写作和回答。
最初只包含大写字母和数字,但在 1967 年添加了小写字母和一些控制字符,形成了所谓的 US-ASCII,即字符 0 到 127。
因此,这套仅有 128 个字符的字符集于 1967 年作为标准发布,包含了用英语书写所需的所有内容。
1981年,IBM开发了8位ASCII码的扩展,称为“代码页437”,在这个版本中替换了一些过时的控制字符为图形字符。此外还添加了 128 个字符,包括新的符号、符号、图形和拉丁字母,以及用其他语言(例如西班牙语)书写文本所需的所有标点符号和字符。
通过这种方式添加了从 128 到 255 的 ASCII 字符。IBM
在其型号 5150(称为“IBM-PC”)的硬件中包含对此代码页的支持,该型号被认为是第一台个人计算机。
这种型号的操作系统“MS-DOS”也使用了这种扩展的ASCII码。
Taken words from an online resource (Cool website though) because I found it useful and appropriate to write and answer.
At first only included capital letters and numbers , but in 1967 was added the lowercase letters and some control characters, forming what is known as US-ASCII, ie the characters 0 through 127.
So with this set of only 128 characters was published in 1967 as standard, containing all you need to write in English language.
In 1981, IBM developed an extension of 8-bit ASCII code, called "code page 437", in this version were replaced some obsolete control characters for graphic characters. Also 128 characters were added , with new symbols, signs, graphics and latin letters, all punctuation signs and characters needed to write texts in other languages, such as Spanish.
In this way was added the ASCII characters ranging from 128 to 255.
IBM includes support for this code page in the hardware of its model 5150, known as "IBM-PC", considered the first personal computer.
The operating system of this model, the "MS-DOS" also used this extended ASCII code.
非 ASCII Unicode 字符。
Non-ASCII Unicode characters.
如果您说“High ASCII”,则根据定义,您的十进制数范围为 128-255。 ASCII 本身被定义为单字节(实际上是 7 位)字符表示形式;后来出现了使用高位来允许非英语字符的情况,并产生了定义由特定值表示的特定字符的代码页。任何多字节(> 255 十进制值)都不是 ASCII。
If you say "High ASCII", you are by definition in the range 128-255 decimal. ASCII itself is defined as a one-byte (actually 7-bit) character representation; the use of the high bit to allow for non-English characters happened later and gave rise to the Code Pages that defined particular characters represented by particular values. Any multibyte (> 255 decimal value) is not ASCII.