明文数据和二进制数据有什么区别?
许多语言都有只处理“明文”而不是二进制的函数。这是否意味着只允许使用 ASCII 范围内的字符?
二进制只是一系列字节,它不是类似于纯文本,只是一系列解释为字符的字节吗?那么,明文可以存储与二进制相同的数据格式/协议吗?
Many languages have functions which only process "plaintext", not binary. Does this mean that only characters within the ASCII range will be allowed?
Binary is just a series of bytes, isn't it similar to plaintext which is just a series of bytes interpreted as characters? So, can plaintext store the same data formats / protocols as binary?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
纯文本是人类可读的,而二进制文件通常是人类无法读取的,因为它由可打印和不可打印的字符组成。
尝试用文本编辑器(例如记事本或vim)打开jpeg 文件,您就会明白我的意思。
二进制文件通常以优化速度的方式构建,因为不需要解析。
纯文本文件可以手动编辑,而二进制文件则不能。
a plain text is human readable, a binary file is usually unreadable by a human, since it's composed of printable and non-printable characters.
Try to open a jpeg file with a text editor (e.g. notepad or vim) and you'll understand what I mean.
A binary file is usually constructed in a way that optimizes speed, since no parsing is needed.
A plain text file is editable by hand, a binary file not.
“明文”可以有多种含义。
在这种情况下最有用的是,它只是一个以字节序列组织的二进制文件,特定的计算机系统可以将其转换为它认为的“文本”字符的有限集。
第二个含义,有点相关,是一种限制,即所述系统应该将这些“文本字符”显示为人类可读的符号,作为可识别字母表的成员。通常,不成文的含义是翻译机制是 ASCII。
第三个,甚至更具限制性的含义是,该系统必须是一个“简单”的文本编辑器/查看器。通常表示 ASCII 编码。但是,实际上,人类阅读以某种时髦格式编码并由专有程序显示的文本与使用 VI 文本编辑器阅读 ASCII 编码文件之间几乎没有什么区别。
在编程上下文中,您的编程环境(由操作系统 + 系统 API + 您的语言功能组成)定义了一组“文本”字符,以及一组它能够读取并转换为这些字符的编码“文本”字符。请注意,这可能不一定意味着 ASCII、英语或 8 位 - 例如,Perl 可以本机读取和使用完整的 Unicode“字符”集。
要回答您的具体问题,您绝对可以使用“字符”字符串来传输任意字节序列,但必须注意字符串终止约定。
问题在于,已经存在的“处理字符数据”函数可能没有任何有用的功能来处理二进制数据。
"Plaintext" can have several meanings.
The one most useful in this context is that it is merely a binary files which is organized in byte sequences that a particular computers system can translate into a finite set of what it considers "text" characters.
A second meaning, somewhat connected, is a restriction that said system should display these "text characters" as symbols readable by a human as members of a recognizable alphabet. Often, the unwritten implication is that the translation mechanism is ASCII.
A third, even more restrictive meaning, is that this system must be a "simple" text editor/viewer. Usually implying ASCII encoding. But, really, there is VERY little difference between you, the human, reading text encoded in some funky format and displayed by a proprietary program, vs. VI text editor reading ASCII encoded file.
Within programming context, your programming environment (comprized by OS + system APIs + your language capabilities) defines both a set of "text" characters, and a set of encodings it is able to read to convert to these "text" characters. Please note that this may not necessarily imply ASCII, English, or 8 bits - as an example, Perl can natively read and use the full Unicode set of "characters".
To answer your specific question, you can definitely use "character" strings to transmit arbitrary byte sequences, with the caveat that string termination conventions must apply.
The problem is that the functions that already exist to "process character data" would probably not have any useful functionality to deal with your binary data.
它通常意味着的一件事是,语言可以随意将某些控制字符(例如值 10 或 13)解释为逻辑行终止符。换句话说,输出操作可能会自动在末尾附加这些字符,而输入操作可能会从输入中剥离它们(和/或终止在那里读取)。
相比之下,宣传对“二进制”数据进行操作的语言 I/O 操作通常会包含一个输入参数,用于表示要操作的数据长度,因为没有其他方法(除了读取文件末尾之外)来知道何时进行操作。完成了。
One thing it often means is that the language might feel free to interpret certian control characters, such as the values 10 or 13, as logical line terminators. In other words, an output operation might automagicly append these characters at the end, and an input operation might strip them from the input (and/or terminate reading there).
In contrast, language I/O operations that advertise working on "binary" data will usually include an input parameter for the length of data to operate on, since there is no other way (short of reading past end of file) to know when it is done.
一般来说,这取决于语言/环境/功能。
二进制数据始终是:二进制。它是在不加修改的情况下转移的。
“纯文本”模式可能意味着以下一项或多项:
Generally, it depends on the language/environment/functionality.
Binary data is always that: binary. It is transferred without modification.
"Plain text" mode may mean one or more of the following things:
技术上没什么。纯文本是二进制数据的一种形式。然而,主要的区别在于值的存储方式。考虑如何存储整数。在二进制数据中,它将使用二进制补码格式,可能占用 32 位空间。在文本格式中,数字将存储为一系列 unicode 数字。因此,数字 50 在二进制中将存储为 0x32(填充以占用 32 位),但在纯文本中将存储为“5”“0”。
Technically nothing. Plain text is a form of binary data. However a major difference is how values are stored. Think of how an integer might be stored. In binary data it would use a two's complement format, probably taking 32 bits of space. In text format a number would be stored instead as a series of unicode digits. So the number 50 would be stored as 0x32 (padded to take up 32 bits) in binary but would be stored as '5' '0' in plain text.