当前位置：文江博客话题详情

测试 C++字节序无关的代码

发布于 2024-11-16 10:00:32 字数 129 浏览 8 评论 0原文

如何测试或检查 C++ 代码的字节序无关性？它已经实现了，我只是想验证它是否适用于小端和大端平台。

我可以编写单元测试并在目标平台上运行它们，但我没有硬件。也许模拟器？

是否可以进行编译时检查？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寂寞笑我太脆弱 2024-11-23 10:00:32

如果您可以使用基于 x86 的 Mac，那么您可以利用 Mac OS X 内置的 PowerPC 模拟以及对 x86（小端）和 PowerPC（大端）的开发人员工具支持。这使您能够在同一平台上编译和运行大端和小端可执行文件，例如，

$ gcc -arch i386 foo.c -o foo_x86 # build little endian x86 executable
$ gcc -arch ppc foo.c -o foo_ppc  # build big endian PowerPC executable

构建了大端和小端可执行文件后，您可以运行在两者上可用的任何单元测试，这将捕获一些与端序相关的类问题，并且您还可以比较由可执行文件（文件、网络数据包等）生成的任何数据 - 这显然应该匹配。

If you have access to an x86-based Mac then you can take advantage of the fact that Mac OS X has PowerPC emulation built in as well as developer tool support for both x86 (little endian) and PowerPC (big endian). This enables you to compile and run a big and little endian executable on the same platform, e.g.

$ gcc -arch i386 foo.c -o foo_x86 # build little endian x86 executable
$ gcc -arch ppc foo.c -o foo_ppc  # build big endian PowerPC executable

Having built both big endian and little endian executables you can then run whatever unit tests you have available on both, which will catch some classes of endianness-related problems, and you can also compare any data generated by the executables (files, network packets, whatever) - this should obviously match.

回复收藏 0 原文

小耗子 2024-11-23 10:00:32

您可以使用 qemu 以相反的字节序设置执行环境。例如，如果您可以访问小端 amd64 或 i386 硬件，则可以设置 qemu 来模拟 PowerPC Linux 平台，并在那里运行您的代码。

回复收藏 0 原文

江城子 2024-11-23 10:00:32

我读过一个故事，使用 Flint（Flexible Lint）来诊断此类错误。

不知道具体细节了，但让我为您搜索一下这个故事：

http://www .datacenterworks.com/stories/flint.html

示例：诊断字节顺序错误
在最近的一次合作中，我们将代码从旧的 Sequent 移植到 SPARC，之后除了我们在“Thud and Blunder 的故事”中讨论的特定指针问题之外，我们还需要查找其他空指针问题以及字节序错误。

回复收藏 0 原文

眼角的笑意。 2024-11-23 10:00:32

我建议采用一种编码技术来避免这个问题。

首先，您必须了解在什么情况下会出现字节顺序问题。然后要么找到一种与字节顺序无关的方法来编写此代码，要么隔离代码。

例如，可能发生字节顺序问题的一个典型问题是当您使用内存访问或联合来挑选较大值的部分时。具体来说，避免：

long x;
...
char second_byte = *(((char *)&x) + 1);

相反，写：

long x;
...
char second_byte = (char)(x >> 8)

连接，这是我最喜欢的之一，因为很多人倾向于认为你只能使用奇怪的技巧来做到这一点。不要这样做：

union uu
{
  long x;
  unsigned short s[2];
};
union uu u;
u.s[0] = low;
u.s[1] = high;
long res = u.x;

而是写：

long res = (((unsigned long)high) << 16) | low

I would suggest adapting a coding technique that avoids the problem all together.

First, you have to understand in which situation an endianess problem occurs. Then either find an endianess-agnostic way to write this, or isolate the code.

For example, a typical problem where endianess issues can occur is when you use memory accesses or unions to pick out parts of a larger value. Concretely, avoid:

long x;
...
char second_byte = *(((char *)&x) + 1);

Instead, write:

long x;
...
char second_byte = (char)(x >> 8)

Concatenation, this is one of my favorites, as many people tend to think that you can only do this using strange tricks. Don't do this:

union uu
{
  long x;
  unsigned short s[2];
};
union uu u;
u.s[0] = low;
u.s[1] = high;
long res = u.x;

Instead write:

long res = (((unsigned long)high) << 16) | low

回复收藏 0 原文

雨轻弹 2024-11-23 10:00:32

我可以编写单元测试并在目标平台上运行它们，但我没有硬件。

您可以设置您的设计，以便独立于实际硬件而轻松运行单元测试。您可以使用依赖注入来做到这一点。我可以通过提供我正在测试的代码与之对话的基接口类来抽象出诸如硬件接口之类的东西。

class IHw
{
public:
    virtual void SendMsg1(const char* msg, size_t size) = 0;
    virtual void RcvMsg2(Msg2Callback* callback) = 0;
     ...
};

然后我可以得到实际与硬件对话的具体实现：

class CHw : public IHw
{
public:
    void SendMsg1(const char* msg, size_t size);
    void RcvMsg2(Msg2Callback* callback);
};

我可以制作一个测试存根版本：

class CTestHw : public IHw
{
public:
    void SendMsg1(const char* msg, size_t);
    void RcvMsg2(Msg2Callback* callback);
};

然后我的真实代码可以使用具体的硬件，但我可以使用 CTestHw 在测试代码中模拟它。

class CSomeClassThatUsesHw
{
public:
   void MyCallback(const char* msg, size_t size)
   {
       // process msg 2
   }
   void DoSomethingToHw()
   {
       hw->SendMsg1();
       hw->RcvMsg2(&MyCallback);
   }
private:
    IHw* hw; 
}

I could write unit tests and run them on the target platforms, but I don't have the hardware.

You can setup your design so that unit tests are easy to run independent of actually having hardware. You can do this using dependency injection. I can abstract away things like hardware interfaces by providing a base interface class that the code I'm testing talks to.

class IHw
{
public:
    virtual void SendMsg1(const char* msg, size_t size) = 0;
    virtual void RcvMsg2(Msg2Callback* callback) = 0;
     ...
};

Then I can have the concrete implementation that actually talks to hardware:

class CHw : public IHw
{
public:
    void SendMsg1(const char* msg, size_t size);
    void RcvMsg2(Msg2Callback* callback);
};

And I can make a test stub version:

class CTestHw : public IHw
{
public:
    void SendMsg1(const char* msg, size_t);
    void RcvMsg2(Msg2Callback* callback);
};

Then my real code can us the concrete Hw, but I can simulate it in test code with CTestHw.

class CSomeClassThatUsesHw
{
public:
   void MyCallback(const char* msg, size_t size)
   {
       // process msg 2
   }
   void DoSomethingToHw()
   {
       hw->SendMsg1();
       hw->RcvMsg2(&MyCallback);
   }
private:
    IHw* hw; 
}

回复收藏 0 原文

烦人精 2024-11-23 10:00:32

我个人使用 Travis 来测试我在 github 上托管的软件，它支持在多种架构上运行 [1]，包括大端字节序的 s390x。

我只需要将其添加到我的 .travis.yml 中：

arch:
  - amd64
  - s390x  # Big endian arch

它可能不是唯一提出此建议的 CI，但这是我已经在使用的 CI。我在两个系统上运行单元测试和集成测试，这给了我一些合理的信心，无论字节序如何，它都能正常工作。

但这不是灵丹妙药，我也希望有一种简单的方法来手动测试它，以确保没有隐藏的错误（例如，我使用的是 SDL，颜色可能是错误的。我使用屏幕截图来验证输出，但是用于截取屏幕截图的代码可能会出现错误，以补偿显示问题，因此测试可以通过，但显示错误）。

[1] https:// /blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z

I personally use Travis to test my software hosted on github and it supports running on multiple architectures [1], including s390x which is big endian.

I just had to add this to my .travis.yml:

arch:
  - amd64
  - s390x  # Big endian arch

It's probably not the only CI proposing this, but that's the one I was already using. I run both unit tests and integrated test on both systems which gives me some reasonable confidence that it works fine no matter the endianness.

It's no silver bullet though, I'd like to have an easy way to test it manually too just to ensure there's no hidden error (e.g I'm using SDL, colors could be wrong. I'm using screenshot to validate the output but the code for taking screenshot could have errors compensating the display problem, so the tests could pass with the display being wrong).

[1] https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z

回复收藏 0 原文

于我来说 2024-11-23 10:00:32

IMO，唯一接近正确的答案是马丁的。如果您不以二进制方式与其他应用程序通信或读取/写入二进制文件，则无需解决字节顺序问题。如果所有持久数据都是字符流的形式（例如，数据包是 ASCII、输入文件是 ASCII、输出文件是 ASCII），则小端机中发生的情况将保留在小端机中。

我将此作为答案，而不是对马丁答案的评论，因为我建议您考虑做一些与马丁提议不同的事情。鉴于主导机器架构是小端，而网络顺序是大端，如果可以完全避免字节交换，就会产生许多优势。解决方案是让您的应用程序能够处理错误的字节序输入。使通信协议以某种机器身份数据包开始。有了这些信息，您的程序就可以知道是否必须对后续传入的数据包进行字节交换或保持原样。如果二进制文件的标头有一些指示符可以让您确定这些文件的字节顺序，则同样的概念也适用。有了这种架构，您的应用程序就可以以本机格式编写，并且知道如何处理非本机格式的输入。

嗯，差不多了。二进制交换/二进制文件还存在其他问题。浮点数据就是这样的问题之一。 IEEE 浮点标准没有说明浮点数据的存储方式。它没有说明字节顺序，没有说明有效数是在指数之前还是之后，也没有说明存储的指数和有效数的存储位顺序。这意味着您可以拥有两台具有相同字节顺序的不同机器，并且都遵循 IEEE 标准，并且您仍然可能在将浮点数据作为二进制进行通信时遇到问题。

另一个现在不那么普遍的问题是字节序不是二进制的。除了大和小之外，还有其他选择。幸运的是，计算机以 2143 顺序（而不是 1234 或 4321 顺序）存储内容的时代已经过去了，除非您处理的是嵌入式系统。

底线：
如果您正在处理一组几乎同质的计算机，只有一两个奇怪的东西（但不是太奇怪），您可能需要考虑避免网络顺序。如果域有多种架构的机器，其中一些非常奇怪，您可能不得不求助于网络秩序的通用语言。（但请注意，这种通用语言并不能完全解决浮点问题。）

IMO, the only answer that comes close to being correct is Martin's. There are no endianness concerns to address if you aren't communicating with other applications in binary or reading/writing binary files. What happens in a little endian machine stays in a little endian machine if all of the persistent data are in the form of a stream of characters (e.g. packets are ASCII, input files are ASCII, output files are ASCII).

I'm making this an answer rather than a comment to Martin's answer because I am proposing you consider doing something different from what Martin proposed. Given that the dominant machine architecture is little endian while network order is big endian, many advantages arise if you can avoid byte swapping altogether. The solution is to make your application able to deal with wrong-endian inputs. Make the communications protocol start with some kind of machine identity packet. With this info at hand, your program can know whether it has to byte swap subsequent incoming packets or leave them as-is. The same concept applies if the header of your binary files has some indicator that lets you determine the endianness of those files. With this kind of architecture at hand, your application(s) can write in native format and can know how to deal with inputs that are not in native format.

Well, almost. There are other problems with binary exchange / binary files. One such problem is floating point data. The IEEE floating point standard doesn't say anything about how floating point data are stored. It says nothing regarding byte order, nothing about whether the significand comes before or after the exponent, nothing about the storage bit order of the as-stored exponent and significand. This means you can have two different machines of the same endianness that both follow the IEEE standard and you can still have problems communicating floating point data as binary.

Another problem, not so prevalent today, is that endianness is not binary. There are other options than big and little. Fortunately, the days of computers that stored things in 2143 order (as opposed to 1234 or 4321 order) are pretty much behind us, unless you deal with embedded systems.

Bottom line:
If you are dealing with a near-homogenous set of computers, with only one or two oddballs (but not too odd), you might want to think of avoiding network order. If the domain has machines of multiple architectures, some of them very odd, you might have to resort to the lingua franca of network order. (But do beware that this lingua franca does not completely resolve the floating point problem.)

回复收藏 0 原文

~没有更多了~