为什么用short*代替char*作为字符串? char* 和 unsigned char* 之间的区别?
正如标题所说,我有两个问题。
编辑:澄清一下,他们实际上并没有使用 char
和 short
,他们通过特定的方式确保它们是 8 位和 16 位类型定义。实际类型称为 UInt8
和 UInt16
。
1.问题
iTunes SDK 使用无符号短*
,其中需要字符串。使用它代替 char*
/unsigned char*
有何优点?如何将其转换为 char*
,以及使用此类型时有何不同?
2.问题
我只在必须存储字符串时见过 char*
。那么我什么时候应该使用 unsigned char*
,或者它没有任何区别?
As the title says, I'm having two questions.
Edit: To clarify, they don't actually use char
and short
, they ensure them to be 8-bit and 16-bit by specific typedefs. The actual type is then called UInt8
and UInt16
.
1. Question
The iTunes SDK uses unsigned short*
where a string is needed. What are the advantages of using it instead of char*
/unsigned char*
? How to convert it to char*
, and what differs when working with this type instead?
2. Question
I've only seen char*
when a string must be stored, yet. When should I use unsigned char*
then, or doesn't it make any difference?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
无符号短数组可以与宽字符串一起使用 - 例如,如果您有 UTF-16 编码文本 - 尽管我希望在这些情况下看到
wchar_t
。但他们可能有自己的理由,比如 MacOS 和 Windows 之间的兼容。 (如果我的消息来源正确,MacOS 的wchar_t
是 32 位,而 Windows 是 16 位。)您可以通过调用适当的库函数在两种类型的字符串之间进行转换。哪个功能合适取决于具体情况。 SDK不是自带的吗?
并且使用
char
而不是unsigned char
,好吧,所有字符串历来都是用char
定义的,因此切换到unsigned char
代码> 会引入不兼容性。(切换到
signed char
也会导致不兼容,但不知何故没有那么多......)编辑现在问题已经编辑了,让我说我没有看到在我输入答案之前进行编辑。但是,是的,由于上述原因,
UInt16
是比 wchar_t 更好的 16 位实体表示。unsigned short
arrays can be used with wide character strings - for instance if you have UTF-16 encoded texts - although I'd expect to seewchar_t
in those cases. But they may have their reasons, like being compatible between MacOS and Windows. (If my sources are right, MacOS'wchar_t
is 32 bits, while Windows' is 16 bits.)You convert between the two types of string by calling the appropriate library function. Which function is appropriate depends on the situation. Doesn't the SDK come with one?
And
char
instead ofunsigned char
, well, all strings have historically always been defined withchar
, so switching tounsigned char
would introduce incompatibilities.(Switching to
signed char
would also cause incompatibilities, but somehow not as many...)Edit Now the question has been edited, let me say that I didn't see the edits before I typed my answer. But yes,
UInt16
is a better representation of a 16 bit entity than wchar_t for the above reason.1.问题 - 答案
我认为他们使用 unsigned Short* 因为他们必须对 unicode 字符使用 UTF-16 编码,从而表示 BMP 内和外的字符。您问题的其余部分取决于源和目标的 Unicode 编码类型 (UTF-8,16,32)
2。问题 - 答案
再次取决于编码类型以及您正在谈论的字符串。如果您计划处理扩展 ASCII 表之外的字符串,则切勿使用有符号或无符号字符。 (除英语之外的任何其他语言)
1. Question - Answer
I would suppose that they use unsigned short* because they must be utilizing UTF-16 encoding for unicode characters and hence representing characters both in and out of the BMP. The rest of your question depends on the type of Unicode encoding of the source and the destination (UTF-8,16,32)
2. Question - Answer
Again depends on the type of encoding and what strings are you talking about. You should never used signed or unsigned characters if you plan to deal with strings of characters outside of the Extended ASCII table. (Any other language except from English)
可能是轻率地尝试使用 UTF-16 字符串。 C 有一个 宽字符 类型,
wchar_t
及其char
(或wchar_t
)可以是 16 位长。虽然我对 SDK 不太熟悉,无法说明为什么他们要走这条路线,但它可能是为了解决编译器问题。在 C99 中,有更合适的 [u]int[least/fast]16_t 类型 - 请参阅
。请注意,C 对数据类型及其底层大小几乎没有保证。有符号或无符号的 Shorts 不保证为 16 位(尽管保证至少有那么多),字符也不限制为 8 或 Widechars 16 或 32。
要在字符和短字符串之间进行转换,您可以使用 SDK 提供的转换函数。如果您确切地知道它们在这些短字符串中存储的内容以及您想要在 char 字符串中存储的内容,您也可以编写自己的库或使用第 3 方库。
这实际上并没有什么区别。如果您想对字符进行(无符号)算术或位操作,通常会转换为
unsigned char
。编辑:在你告诉我们他们使用 UInt16 而不是 unsigned Short 之前,我写了(或者开始写,无论如何)这个答案。在这种情况下,就不会涉及到野兔的大脑了。专有类型可能用于与没有 stdint 类型的旧版(或不兼容)编译器兼容,以存储 UTF-16 数据。这是完全合理的。
Probably a harebrained attempt to use UTF-16 strings. C has a wide character type,
wchar_t
and itschar
s (orwchar_t
s) can be 16 bits long. Though I'm not familiar enough with the SDK to say why exactly they went through this route, it's probably to work around compiler issues. In C99 there are much more suitable [u]int[least/fast]16_t types - see<stdint.h>
.Note that C makes very little guarantees about data types and their underlying sizes. Signed or unsigned shorts aren't guaranteed to be 16 bits (though they are guaranteed to be at least that much), nor are chars restricted to 8 or widechars 16 or 32.
To convert between char and short strings, you'd use the conversion functions provided by the SDK. You could also write your own or use a 3rd party library, if you knew exactly what they stored in those short strings AND what you wanted in your char strings.
It doesn't really make a difference. You'd normally convert to
unsigned char
if you wanted to do (unsigned) arithmetic or bit manipulation on a character.Edit: I wrote (or started writing, anyhow) this answer before you told us they used UInt16 and not unsigned short. In that case there are no hare brains involved; the proprietary type is probably used for compatibility with older (or noncompliant) compilers which don't have the stdint types, to store UTF-16 data. Which is perfectly reasonable.