有没有一种从 unsigned char* 转换为 char* 的好方法?

发布于 2024-10-12 04:42:10 字数 1213 浏览 7 评论 0原文

这些天我读了很多关于reinterpret_cast<>以及应该如何使用它的文章(并在大多数情况下避免它)。

虽然我知道使用 reinterpret_cast<> 进行转换,比如将 unsigned char* 转换为 char*实现定义的 (因此不可移植)似乎没有其他方法可以有效将一种方式转换为另一种方式。

假设我使用一个处理 unsigned char* 的库来处理一些计算。在内部,我已经使用 char* 来存储我的数据(而且我无法更改它,因为如果我这样做它会杀死小狗)。

我会这样做:

char* mydata = getMyDataSomewhere();
size_t mydatalen = getMyDataLength();

// We use it here
// processData() takes a unsigned char*
void processData(reinterpret_cast<unsigned char*>(mydata), mydatalen);

// I could have done this:
void processData((unsigned char*)mydata, mydatalen);
// But it would have resulted in a similar call I guess ?

如果我希望我的代码具有高度可移植性,那么除了首先复制数据之外,我似乎别无选择。比如:

char* mydata = getMyDataSomewhere();
size_t mydatalen = getMyDataLength();
unsigned char* mydata_copy = new unsigned char[mydatalen];
for (size_t i = 0; i < mydatalen; ++i)
  mydata_copy[i] = static_cast<unsigned char>(mydata[i]);

void processData(mydata_copy, mydatalen);

当然,这不是最理想的,我什至不确定它是否比第一个解决方案更便携。

所以问题是,在这种情况下你会怎么做才能拥有高度可移植的代码?

I've been reading a lot those days about reinterpret_cast<> and how on should use it (and avoid it on most cases).

While I understand that using reinterpret_cast<> to cast from, say unsigned char* to char* is implementation defined (and thus non-portable) it seems to be no other way for efficiently convert one to the other.

Lets say I use a library that deals with unsigned char* to process some computations. Internaly, I already use char* to store my data (And I can't change it because it would kill puppies if I did).

I would have done something like:

char* mydata = getMyDataSomewhere();
size_t mydatalen = getMyDataLength();

// We use it here
// processData() takes a unsigned char*
void processData(reinterpret_cast<unsigned char*>(mydata), mydatalen);

// I could have done this:
void processData((unsigned char*)mydata, mydatalen);
// But it would have resulted in a similar call I guess ?

If I want my code to be highly portable, it seems I have no other choice than copying my data first. Something like:

char* mydata = getMyDataSomewhere();
size_t mydatalen = getMyDataLength();
unsigned char* mydata_copy = new unsigned char[mydatalen];
for (size_t i = 0; i < mydatalen; ++i)
  mydata_copy[i] = static_cast<unsigned char>(mydata[i]);

void processData(mydata_copy, mydatalen);

Of course, that is highly suboptimal and I'm not even sure that it is more portable than the first solution.

So the question is, what would you do in this situation to have a highly-portable code ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

给不了的爱 2024-10-19 04:42:10

便携性是一个实际问题。因此,用于在 char*unsigned char* 之间转换的特定用法的 reinterpret_cast 是可移植的。但我仍然会将此用法包装在一对函数中,而不是直接在每个地方执行 reinterpret_cast

当使用一种几乎所有缺点(包括有关 reinterpret_cast 的有限保证)都支持效率的语言时,不要过度引入低效率。

这将违背语言的精神,同时又严格遵守字面意思。

干杯&嗯。

Portable is an in-practice matter. As such, reinterpret_cast for the specific usage of converting between char* and unsigned char* is portable. But still I'd wrap this usage in a pair of functions instead of doing the reinterpret_cast directly each place.

Don't go overboard introducing inefficiencies when using a language where nearly all the warts (including the one about limited guarantees for reinterpret_cast) are in support of efficiency.

That would be working against the spirit of the language, while adhering to the letter.

Cheers & hth.

染年凉城似染瑾 2024-10-19 04:42:10

char 和 unsigned char 类型之间的区别仅仅是数据语义。这仅影响编译器对任一类型的数据元素执行算术的方式。 char 类型向编译器发出信号,表明高位的值将被解释为负数,以便编译器应执行二进制补码算术。由于这是两种类型之间的唯一区别,因此我无法想象 reinterpret_cast的场景。 (mydata) 将生成与 (unsigned char*) mydata 不同的输出。此外,如果您只是通知编译器数据语义的更改,即从有符号算术切换到无符号算术,则没有理由复制数据。

编辑:虽然从实际角度来看上述内容是正确的,但我应该注意 C++ 标准规定 char、unsigned char 和 sign char 是三种不同的数据类型。 § 3.9.1.1:

声明为字符(char)的对象应足够大以存储
实现的基本字符集的任何成员。如果一个字符
该集合中的整数值存储在一个字符对象中
该字符对象等于单个字符的值
该字符的字面形式。是否是实现定义的
char 对象可以保存负值。字符可以明确
声明未签名或已签名。普通字符、有符号字符和无符号字符
char 是三种不同的类型,统称为窄字符
类型。 char、signed char 和 unsigned char 占用相同的空间
存储量并具有相同的对齐要求(3.11);
也就是说,它们具有相同的对象表示。对于狭窄的
字符类型,对象表示的所有位都参与
值表示。对于无符号窄字符类型,所有
值表示的可能位模式表示数字。
这些要求不适用于其他类型。在任何特定的
实现中,普通 char 对象可以采用相同的值
作为有符号字符或无符号字符;哪一个是
实现定义的。

The difference between char and an unsigned char types is merely data semantics. This only affects how the compiler performs arithmetic on data elements of either type. The char type signals the compiler that the value of the high bit is to be interpreted as negative, so that the compiler should perform twos-complement arithmetic. Since this is the only difference between the two types, I cannot imagine a scenario where reinterpret_cast <unsigned char*> (mydata) would generate output any different than (unsigned char*) mydata. Moreover, there is no reason to copy the data if you are merely informing the compiler about a change in data sematics, i.e., switching from signed to unsigned arithmetic.

EDIT: While the above is true from a practical standpoint, I should note that the C++ standard states that char, unsigned char and sign char are three distinct data types. § 3.9.1.1:

Objects declared as characters (char) shall be large enough to store
any member of the implementation’s basic character set. If a character
from this set is stored in a character object, the integral value of
that character object is equal to the value of the single character
literal form of that character. It is implementation-defined whether a
char object can hold negative values. Characters can be explicitly
declared unsigned or signed. Plain char, signed char, and unsigned
char are three distinct types, collectively called narrow character
types. A char, a signed char, and an unsigned char occupy the same
amount of storage and have the same alignment requirements (3.11);
that is, they have the same object representation. For narrow
character types, all bits of the object representation participate in
the value representation. For unsigned narrow character types, all
possible bit patterns of the value representation represent numbers.
These requirements do not hold for other types. In any particular
implementation, a plain char object can take on either the same values
as a signed char or an unsigned char; which one is
implementation-defined.

梦过后 2024-10-19 04:42:10

跟着演员一起去,实践一下就可以了。

我只是想补充一点:

for (size_t i = 0; i < mydatalen; ++i)
  mydata_copy[i] = static_cast<unsigned char>(mydata[i]);

虽然不是未定义的行为,但可以在没有 2 补码算术的情况下更改机器上字符串的内容。相反的情况就是未定义的行为。

Go with the cast, it's OK in practice.

I just want to add that this:

for (size_t i = 0; i < mydatalen; ++i)
  mydata_copy[i] = static_cast<unsigned char>(mydata[i]);

while not being undefined behaviour, could change the contents of your string on machines without 2-complement arithmetic. The reverse would be undefined behaviour.

紙鸢 2024-10-19 04:42:10

为了实现 C 兼容性,unsigned char*char* 类型有额外的限制。基本原理是像 memcpy() 这样的函数必须工作,这限制了编译器的自由。 (unsigned char*) &foo 仍必须指向对象 foo。因此,在这种特殊情况下不必担心。

For C compatibility, the unsigned char* and char* types have extra limitations. The rationale is that functions like memcpy() have to work, and this limits the freedom that compilers have. (unsigned char*) &foo must still point to object foo. Therefore, don't worry in this specific case.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文