空终止符是文本编码的一部分吗?
我正在尝试从字节数组中读取以空字符结尾的字符串;函数的参数是编码。
string ReadString(Encoding encoding)
例如,以下编码中的“foo”是:
UTF-32: 66 00 00 00 6f 00 00 00 6f 00 00 00
UTF-8: 66 6f 6f
UTF-7: 66 6f 6f 2b 41 41 41 2d
如果我将所有字节复制到一个数组中(读取到空终止符)并将该数组传递给 encoding.GetString()
,它不会不起作用,因为如果字符串是 UTF-32 编码的,我的算法将在第二个字节之后到达“空终止符”。
所以我有一个双重问题:空终止符是编码的一部分吗?如果不是,我如何逐个字符地解码字符串并检查后面的字节是否有空终止符?
提前致谢
(也欢迎提出建议)
编辑:
如果“foo”以 null 结尾并采用 utf-32 编码,那么它会是什么?:
1. 66 00 00 00 6f 00 00 00 6f 00 00 00 00
2. 66 00 00 00 6f 00 00 00 6f 00 00 00 00 00 00 00
I'm trying to read a null terminated string from a byte array; the parameter to the function is the encoding.
string ReadString(Encoding encoding)
For example, "foo" in the following encodings are:
UTF-32: 66 00 00 00 6f 00 00 00 6f 00 00 00
UTF-8: 66 6f 6f
UTF-7: 66 6f 6f 2b 41 41 41 2d
If I copied all the bytes into an array (reading up to the null terminator) and passed that array into encoding.GetString()
, it wouldn't work because if the string was UTF-32 encoded my algorithm would reach the "null terminator" after the second byte.
So I sort of have a double question: Are null terminators part of the encoding? If not, how could I decode the string character by character and check the following byte for the null terminator?
Thanks in advance
(suggestions are also appreciated)
Edit:
If "foo" was null terminated and utf-32 encoded, which would it be?:
1. 66 00 00 00 6f 00 00 00 6f 00 00 00 00
2. 66 00 00 00 6f 00 00 00 6f 00 00 00 00 00 00 00
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
空终止符不是字符串的“逻辑”部分;它不被视为有效负载。它在 C/C++ 中广泛用于指示字符串的结束位置。
话虽如此,您可以使用嵌入 \0 的字符串,但是您必须小心确保字符串不会被截断。例如 std::string 不存在嵌入 \0 的问题。但是,如果执行 c_str() 且不考虑报告的 length(),则您的字符串将显示为被截断。
The null terminator is not "logically" part of the string; it's not considered payload. It's widely used in C/C++ to indicate where the string ends.
Having said that you can have strings with embedded \0's but then you have to be careful to ensure the string doesn't appear truncated. For example std::string doesn't have a problem with embedded \0's. But if do a c_str() and and not account for the reported length() your string will appear cut off.
空终止符不是编码的一部分,而是某些编程语言(例如 C)使用的字符串表示形式。在 .NET 中,System.String 以字符串长度作为 32 位整数作为前缀,并且不是以空终止符。在内部 System.String 始终为 UTF-16,但您可以使用编码来输出不同的表示形式。
对于第二部分...使用 System.Text 中的类(例如 UTF8Encoding 和 UTF32Encoding)来读取字符串。您只需根据您的参数选择正确的...
Null terminators are not part of the encoding, but the string representation used by some programming language, such as C. In .NET, System.String is prefixed by the string length as a 32-bit integer and is not null-terminated. Internally System.String is always UTF-16, but you can use the encoding to output different representations.
For the second part... Use the classes in System.Text such as UTF8Encoding and UTF32Encoding to read the string. You just have to select the right one based on your parameter...
这似乎对我来说效果很好(来自从字节数组读取 unicode、以 null 结尾的字符串的实际代码的示例):
This seems to work well for me (sample from actual code that reads a unicode, null terminated string from a byte array):