有没有一种从 unsigned char* 转换为 char* 的好方法?
这些天我读了很多关于reinterpret_cast<>
以及应该如何使用它的文章(并在大多数情况下避免它)。
虽然我知道使用 reinterpret_cast<>
进行转换,比如将 unsigned char*
转换为 char*
是实现定义的 (因此不可移植)似乎没有其他方法可以有效将一种方式转换为另一种方式。
假设我使用一个处理 unsigned char*
的库来处理一些计算。在内部,我已经使用 char*
来存储我的数据(而且我无法更改它,因为如果我这样做它会杀死小狗)。
我会这样做:
char* mydata = getMyDataSomewhere();
size_t mydatalen = getMyDataLength();
// We use it here
// processData() takes a unsigned char*
void processData(reinterpret_cast<unsigned char*>(mydata), mydatalen);
// I could have done this:
void processData((unsigned char*)mydata, mydatalen);
// But it would have resulted in a similar call I guess ?
如果我希望我的代码具有高度可移植性,那么除了首先复制数据之外,我似乎别无选择。比如:
char* mydata = getMyDataSomewhere();
size_t mydatalen = getMyDataLength();
unsigned char* mydata_copy = new unsigned char[mydatalen];
for (size_t i = 0; i < mydatalen; ++i)
mydata_copy[i] = static_cast<unsigned char>(mydata[i]);
void processData(mydata_copy, mydatalen);
当然,这不是最理想的,我什至不确定它是否比第一个解决方案更便携。
所以问题是,在这种情况下你会怎么做才能拥有高度可移植的代码?
I've been reading a lot those days about reinterpret_cast<>
and how on should use it (and avoid it on most cases).
While I understand that using reinterpret_cast<>
to cast from, say unsigned char*
to char*
is implementation defined (and thus non-portable) it seems to be no other way for efficiently convert one to the other.
Lets say I use a library that deals with unsigned char*
to process some computations. Internaly, I already use char*
to store my data (And I can't change it because it would kill puppies if I did).
I would have done something like:
char* mydata = getMyDataSomewhere();
size_t mydatalen = getMyDataLength();
// We use it here
// processData() takes a unsigned char*
void processData(reinterpret_cast<unsigned char*>(mydata), mydatalen);
// I could have done this:
void processData((unsigned char*)mydata, mydatalen);
// But it would have resulted in a similar call I guess ?
If I want my code to be highly portable, it seems I have no other choice than copying my data first. Something like:
char* mydata = getMyDataSomewhere();
size_t mydatalen = getMyDataLength();
unsigned char* mydata_copy = new unsigned char[mydatalen];
for (size_t i = 0; i < mydatalen; ++i)
mydata_copy[i] = static_cast<unsigned char>(mydata[i]);
void processData(mydata_copy, mydatalen);
Of course, that is highly suboptimal and I'm not even sure that it is more portable than the first solution.
So the question is, what would you do in this situation to have a highly-portable code ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
便携性是一个实际问题。因此,用于在
char*
和unsigned char*
之间转换的特定用法的reinterpret_cast
是可移植的。但我仍然会将此用法包装在一对函数中,而不是直接在每个地方执行reinterpret_cast
。当使用一种几乎所有缺点(包括有关
reinterpret_cast
的有限保证)都支持效率的语言时,不要过度引入低效率。这将违背语言的精神,同时又严格遵守字面意思。
干杯&嗯。
Portable is an in-practice matter. As such,
reinterpret_cast
for the specific usage of converting betweenchar*
andunsigned char*
is portable. But still I'd wrap this usage in a pair of functions instead of doing thereinterpret_cast
directly each place.Don't go overboard introducing inefficiencies when using a language where nearly all the warts (including the one about limited guarantees for
reinterpret_cast
) are in support of efficiency.That would be working against the spirit of the language, while adhering to the letter.
Cheers & hth.
char 和 unsigned char 类型之间的区别仅仅是数据语义。这仅影响编译器对任一类型的数据元素执行算术的方式。 char 类型向编译器发出信号,表明高位的值将被解释为负数,以便编译器应执行二进制补码算术。由于这是两种类型之间的唯一区别,因此我无法想象
reinterpret_cast的场景。 (mydata)
将生成与(unsigned char*) mydata
不同的输出。此外,如果您只是通知编译器数据语义的更改,即从有符号算术切换到无符号算术,则没有理由复制数据。编辑:虽然从实际角度来看上述内容是正确的,但我应该注意 C++ 标准规定 char、unsigned char 和 sign char 是三种不同的数据类型。 § 3.9.1.1:
The difference between char and an unsigned char types is merely data semantics. This only affects how the compiler performs arithmetic on data elements of either type. The char type signals the compiler that the value of the high bit is to be interpreted as negative, so that the compiler should perform twos-complement arithmetic. Since this is the only difference between the two types, I cannot imagine a scenario where
reinterpret_cast <unsigned char*> (mydata)
would generate output any different than(unsigned char*) mydata
. Moreover, there is no reason to copy the data if you are merely informing the compiler about a change in data sematics, i.e., switching from signed to unsigned arithmetic.EDIT: While the above is true from a practical standpoint, I should note that the C++ standard states that char, unsigned char and sign char are three distinct data types. § 3.9.1.1:
跟着演员一起去,实践一下就可以了。
我只是想补充一点:
虽然不是未定义的行为,但可以在没有 2 补码算术的情况下更改机器上字符串的内容。相反的情况就是未定义的行为。
Go with the cast, it's OK in practice.
I just want to add that this:
while not being undefined behaviour, could change the contents of your string on machines without 2-complement arithmetic. The reverse would be undefined behaviour.
为了实现 C 兼容性,
unsigned char*
和char*
类型有额外的限制。基本原理是像memcpy()
这样的函数必须工作,这限制了编译器的自由。(unsigned char*) &foo
仍必须指向对象 foo。因此,在这种特殊情况下不必担心。For C compatibility, the
unsigned char*
andchar*
types have extra limitations. The rationale is that functions likememcpy()
have to work, and this limits the freedom that compilers have.(unsigned char*) &foo
must still point to object foo. Therefore, don't worry in this specific case.