为什么我的 TCP 传输在 cygwin 上损坏？

发布于 2024-10-18 08:54:14 字数 3636 浏览 3 评论 0原文

我正在尝试调试为什么我的 TCP 传输在从 Cygwin 发送时被损坏。我看到在 Centos 上运行的服务器程序中只显示了每个结构的前 24 个字节。第 25 到 28 个字节被扰乱，之后的所有其他字节都被清零。朝另一个方向走，从 Cygwin 上的 Centos 接收，同样只有每个块的前 24 个字节显示在我在 Cygwin 上运行的服务器程序中。第 25 到 40 个字节被扰乱，之后的所有其他字节都被清零。在 Cygwin 上向本地主机发送或从本地主机接收数据时，我也看到了这个问题。对于 localhost，前 34 个字节是正确的，之后的所有字节都被清零。

我正在开发的应用程序在与 Centos 通信的 Centos4 上运行良好，我正在尝试将其移植到 Cygwin。 Valgrind 在 Centos 上没有报告任何问题，我没有在 Cygwin 上运行 Valgrind。两个平台都是小端 x86。

我在运行 Cygwin 的主机 Windows XP 系统上运行 Wireshark。当我用 Wireshark 嗅探数据包时，无论是从 Cygwin 发送的数据包还是从 Cygwin 接收的数据包，它们看起来都很完美。

不知何故，数据在 Wireshark 所查看的级别和程序本身之间被损坏。

C++ 代码使用 ::write(fd, buffer, size) 和 ::read(fd, buffer, size) 写入和读取 TCP 数据包，其中 fd 是在客户端和服务器之间打开的套接字的文件描述符。此代码在与 Centos 通信的 Centos4 上完美运行。

对我来说最奇怪的是，数据包嗅探器在所有情况下都显示正确的完整数据包，但 cygwin 应用程序永远不会读取完整的数据包，或者在另一个方向上，Centos 应用程序永远不会读取完整的数据包。

谁能建议我如何调试这个？

这是一些请求的代码：

size_t
read_buf(int fd, char *buf, size_t count, bool &eof, bool immediate)
{
  if (count > SSIZE_MAX) {
    throw;
  }

  size_t want = count;
  size_t got = 0;

  fd_set readFdSet;
  int fdMaxPlus1 = fd + 1;

  FD_ZERO(&readFdSet);
  FD_SET(fd, &readFdSet);

  while (got < want) {
    errno = 0;

    struct timeval timeVal;
    const int timeoutSeconds = 60;

    timeVal.tv_usec = 0;
    timeVal.tv_sec = immediate ? 0 : timeoutSeconds;

    int selectReturn = ::select(fdMaxPlus1, &readFdSet, NULL, NULL, &timeVal);

    if (selectReturn < 0) {
      throw;
    }

    if (selectReturn == 0 || !FD_ISSET(fd, &readFdSet)) {
      throw;
    }

    errno = 0;

    // Read buffer of length count.
    ssize_t result = ::read(fd, buf, want - got);

    if (result < 0) {
      throw;
    } else {
      if (result != 0) {
        // Not an error, increment the byte counter 'got' & the read pointer,
        // buf.
        got += result;
        buf += result;
      } else { // EOF because zero result from read.
        eof = true;
        break;
      }
    }
  }
  return got;
}

我发现了有关此故障的更多信息。读入数据包的 C++ 类的布局如下：

unsigned char _array[28];
long long _sequence;
unsigned char _type;
unsigned char _num;
short _size;

显然，long long 被后面的四个字节打乱。

Centos 应用程序发送的 C++ 内存以 _sequence 开头，以十六进制表示，看起来像这样写入（）：

_sequence: 45 44 35 44 33 34 43 45
    _type: 05
     _num: 33
    _size: 02 71

Wireshark 在数据包中显示以网络大端格式布置的内存，如下所示：

_sequence: 45 43 34 33 44 35 44 45
    _type: 05
     _num: 33
    _size: 71 02

但是，在 read() 之后C++ cygwin 小端应用程序，它看起来像这样：

_sequence: 02 71 33 05 45 44 35 44
    _type: 00
     _num: 00
    _size: 00 00

我很困惑这是如何发生的。看起来是big-endian和little-endian的问题，但是两个平台都是little-endian。

这里 _array 是 7 个整数而不是 28 个字符。

发送方完成内存转储：

_array[0]: 70 a2 b7 cf
_array[1]: 9b 89 41 2c
_array[2]: aa e9 15 76
_array[3]: 9e 09 b6 e2
_array[4]: 85 49 08 81
_array[5]: bd d7 9b 1e
_array[6]: f2 52 df db
_sequence: 41 41 31 35 32 43 38 45
    _type: 05
     _num: 45
    _size: 02 71

接收方：

_array[0]: 70 a2 b7 cf
_array[1]: 9b 89 41 2c
_array[2]: aa e9 15 76
_array[3]: 9e 09 b6 e2
_array[4]: 85 49 08 81
_array[5]: bd d7 9b 1e
_array[6]: f2 52 df db
_sequence: 02 71 45 05 41 41 31 35
    _type: 0
     _num: 0
    _size: 0

Cygwin 测试结果：

4
8
48
0x22be08
0x22be28
0x22be31
0x22be32
0x22be38

Centos 测试结果：

4
8
40
0xbfffe010
0xbfffe02c
0xbfffe035
0xbfffe036
0xbfffe038

原文

I am trying to debug why my TCP transfers are corrupted when sent from Cygwin. I see that only the first 24 bytes of each structure are showing up in my server program running on Centos. The 25th through 28th bytes are scrambled and all others after that are zeroed out. Going in the other direction, receiving from Centos on Cygwin, again only the first 24 bytes of each block are showing up in my server program running on Cygwin. The 25th through 40th bytes are scrambled and all others after that are zeroed out. I also see the issue when sending or receiving to/from localhost on Cygwin. For localhost, the first 34 bytes are correct and all after that are zeroed out.

The application I am working on work fine on Centos4 talking to Centos and I am trying to port it to Cygwin. Valgrind reports no issues on Centos, I do not have Valgrind running on Cygwin. Both platforms are little-endian x86.

I've run Wireshark on the host Windows XP system under which Cygwin is running. When I sniff the packets with Wireshark they look perfect, for both sent packets from Cygwin and received packets to Cygwin.

Somehow, the data is corrupted between the level Wireshark looks at and the program itself.

The C++ code uses ::write(fd, buffer, size) and ::read(fd, buffer, size) to write and read the TCP packets where fd is a file descriptor for the socket that is opened between the client and server. This code works perfectly on Centos4 talking to Centos.

The strangest thing to me is that the packet sniffer shows the correct complete packet for all cases, yet the cygwin application never reads the complete packet or in the other direction, the Centos application never reads the complete packet.

Can anyone suggest how I might go about debugging this?

Here is some requested code:

size_t
read_buf(int fd, char *buf, size_t count, bool &eof, bool immediate)
{
  if (count > SSIZE_MAX) {
    throw;
  }

  size_t want = count;
  size_t got = 0;

  fd_set readFdSet;
  int fdMaxPlus1 = fd + 1;

  FD_ZERO(&readFdSet);
  FD_SET(fd, &readFdSet);

  while (got < want) {
    errno = 0;

    struct timeval timeVal;
    const int timeoutSeconds = 60;

    timeVal.tv_usec = 0;
    timeVal.tv_sec = immediate ? 0 : timeoutSeconds;

    int selectReturn = ::select(fdMaxPlus1, &readFdSet, NULL, NULL, &timeVal);

    if (selectReturn < 0) {
      throw;
    }

    if (selectReturn == 0 || !FD_ISSET(fd, &readFdSet)) {
      throw;
    }

    errno = 0;

    // Read buffer of length count.
    ssize_t result = ::read(fd, buf, want - got);

    if (result < 0) {
      throw;
    } else {
      if (result != 0) {
        // Not an error, increment the byte counter 'got' & the read pointer,
        // buf.
        got += result;
        buf += result;
      } else { // EOF because zero result from read.
        eof = true;
        break;
      }
    }
  }
  return got;
}

I've discovered more about this failure. The C++ class where the packet is being read into is laid out like this:

unsigned char _array[28];
long long _sequence;
unsigned char _type;
unsigned char _num;
short _size;

Apparently, the long long is getting scrambled with the four bytes that follow.

The C++ memory sent by Centos application, starting with _sequence, in hex, looks like this going to write():

_sequence: 45 44 35 44 33 34 43 45
    _type: 05
     _num: 33
    _size: 02 71

Wireshark shows the memory laid out in network big-endian format like this in the packet:

_sequence: 45 43 34 33 44 35 44 45
    _type: 05
     _num: 33
    _size: 71 02

But, after read() in the C++ cygwin little-endian application, it looks like this:

_sequence: 02 71 33 05 45 44 35 44
    _type: 00
     _num: 00
    _size: 00 00

I'm stumped as to how this is occurring. It seems to be an issue with big-endian and little-endian, but the two platforms are both little-endian.

Here _array is 7 ints instead of 28 chars.

Complete memory dump at sender:

_array[0]: 70 a2 b7 cf
_array[1]: 9b 89 41 2c
_array[2]: aa e9 15 76
_array[3]: 9e 09 b6 e2
_array[4]: 85 49 08 81
_array[5]: bd d7 9b 1e
_array[6]: f2 52 df db
_sequence: 41 41 31 35 32 43 38 45
    _type: 05
     _num: 45
    _size: 02 71

And at receipt:

_array[0]: 70 a2 b7 cf
_array[1]: 9b 89 41 2c
_array[2]: aa e9 15 76
_array[3]: 9e 09 b6 e2
_array[4]: 85 49 08 81
_array[5]: bd d7 9b 1e
_array[6]: f2 52 df db
_sequence: 02 71 45 05 41 41 31 35
    _type: 0
     _num: 0
    _size: 0

Cygwin test result:

4
8
48
0x22be08
0x22be28
0x22be31
0x22be32
0x22be38

Centos test result:

4
8
40
0xbfffe010
0xbfffe02c
0xbfffe035
0xbfffe036
0xbfffe038

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

云归处 2024-10-25 08:54:14

现在您已经显示了数据，您的问题就清楚了。您无法控制结构体的对齐方式，因此编译器会自动将 8 字节字段（long long）放在从结构体开头算起的 8 字节边界（偏移量 32）上，从而留下4 个字节的填充。

将对齐方式更改为 1 字节，一切都应该解决。这是您需要的代码片段：

__attribute__ ((aligned (1))) __attribute ((packed))

我还建议您对通过网络进行 blitted 的结构使用固定大小类型，例如 uint8_t、uint32_t、uint64_t code>

之前的想法：

使用 TCP，您不需要读取 和写入 数据包。您从字节流中读取和写入。数据包用于携带这些字节，但不保留边界。

您的代码看起来可以很好地处理这个问题，您可能需要更新问题的措辞。

Now that you've shown the data, your problem is clear. You're not controlling the alignment of your struct, so the compiler is automatically putting the 8 byte field (the long long) on an 8 byte boundary (offset 32) from the start of the struct, leaving 4 bytes of padding.

Change the alignment to 1 byte and everything should resolve. Here's the snippet you need:

__attribute__ ((aligned (1))) __attribute ((packed))

I also suggest that you use the fixed-size types for structures being blitted across the network, e.g. uint8_t, uint32_t, uint64_t

Previous thoughts:

With TCP, you don't read and write packets. You read and write from a stream of bytes. Packets are used to carry these bytes, but boundaries are not preserved.

Your code looks like it deals with this reasonably well, you might want to update the wording of your question.

回复收藏 0 原文

心作怪 2024-10-25 08:54:14

希望最终更新:-)

根据您的最新更新，Centos 正在字节级别打包您的结构，而 CygWin 则不是。这会导致对齐问题。我不确定为什么 CygWin 到 CygWin 的情况会出现问题，因为填充应该是相同的，但我可以告诉您如何修复其他情况。

使用我之前给出的代码：

#include <stdio.h>
typedef struct {
    unsigned char _array[28];
    long long _sequence;
    unsigned char _type;
    unsigned char _num;
    short _size;
} tType;
int main (void) {
    tType t[2];
    printf ("%d\n", sizeof(long));
    printf ("%d\n", sizeof(long long));
    printf ("%d\n", sizeof(tType));
    printf ("%p\n", &(t[0]._array));
    printf ("%p\n", &(t[0]._sequence));
    printf ("%p\n", &(t[0]._num));
    printf ("%p\n", &(t[0]._size));
    printf ("%p\n", &(t[1]));
    return 0;
}

如果您不需要任何填充，您有两种选择。第一个是重新组织你的结构，将更严格的类型放在前面：

typedef struct {
    long long _sequence;
    short _size;
    unsigned char _array[28];
    unsigned char _type;
    unsigned char _num;
} tType;

这给你：

4
8
40
0x22cd42
0x22cd38
0x22cd5f
0x22cd40
0x22cd60

换句话说，每个结构正好是 40 个字节（8 个用于序列，2 个用于大小，28 个用于数组，各 1 个用于类型和数）。但如果您希望按特定顺序排列，这可能是不可能的。

在这种情况下，您可以通过以下方式强制对齐在字节级别：

typedef struct {
    unsigned char _array[28];
    long long _sequence;
    unsigned char _type;
    unsigned char _num;
    short _size;
} __attribute__ ((aligned(1),packed)) tType;

aligned(1) 将其设置为字节对齐，但这不会产生太大影响，因为对象不喜欢对齐减少。要强制执行此操作，您还需要使用 packed。

这样做会给你：

4
8
40
0x22cd3c
0x22cd58
0x22cd61
0x22cd62
0x22cd64

早期的繁荣历史：

好吧，因为我从 CygWin 中 wget 和 ftp 大文件就很好了，我的通灵调试技巧告诉我对我来说，这更有可能是您的代码而不是 CygWin 软件的问题。

换句话说，关于“数据包在 Wireshark 所查看的级别和程序本身之间被损坏”这句话，我会认真地关注该范围的上端而不是下端:-)

通常情况就是这样您假设 read 将获取发送的整个数据包而不是一次获取位，但是，如果没有看到有问题的代码，这是一个非常疯狂的猜测。

确保检查 read 的返回值，以了解实际接收了多少字节。除此之外，发布负责读取的代码，以便我们可以进行更深入的分析。

根据您发布的代码，看起来没问题。我唯一可以建议的是，您检查传入的缓冲区是否足够大，即使足够大，也请确保在返回后立即打印它们，以防出现其他代码片段正在损坏数据。

事实上，在更仔细地重新阅读你的问题时，我有点困惑。您声称您的服务器代码在 Linux 和 CygWin 上都有同样的问题，但又说它可以在 Centos 上运行。

此时我唯一的建议是将调试 printf 语句放入您所显示的函数中，例如在 select 和 read 调用之后输出相关变量，包括更改后的 got 和 buf 以及每个代码路径中的变量，以便您可以看到它在做什么。并且还在发送端逐字节转储整个结构。

这有望立即向您显示问题所在，特别是因为您的数据似乎显示在错误的位置。

并确保您的类型在两端兼容。我的意思是，如果 long long 在两个平台上的大小不同，您的数据将会错位。

好的，检查两端的对齐情况，在两个系统上编译并运行该程序：

#include <stdio.h>
typedef struct {
    unsigned char _array[28];
    long long _sequence;
    unsigned char _type;
    unsigned char _num;
    short _size;
} tType;
int main (void) {
    tType t[2];
    printf ("%d\n", sizeof(long));
    printf ("%d\n", sizeof(long long));
    printf ("%d\n", sizeof(tType));
    printf ("%p\n", &(t[0]._array));
    printf ("%p\n", &(t[0]._sequence));
    printf ("%p\n", &(t[0]._num));
    printf ("%p\n", &(t[0]._size));
    printf ("%p\n", &(t[1]));
    return 0;
}

在我的 CygWin 上，我得到：

4            long size
8            long long size
48           structure size
0x22cd30     _array start (size = 28, padded to 32)
0x22cd50     _sequence start (size = 8, padded to 9???)
0x22cd59     _type start (size = 1)
0x22cd5a     _size start (size = 2, padded to 6 for long long alignment).
0x22cd60     next array element.

唯一奇怪的一点是 _type 之前的填充，但这当然是有效的，尽管是意外的。

检查 Centos 的输出，看看是否不兼容。但是，您关于 CygWin-to-CygWin 不起作用的说法与这种可能性不一致，因为配置和大小是兼容的（除非您的发送和接收代码编译不同）。

Hopefully final update :-)

Based on your latest update, Centos is packing your structures at the byte level whilst CygWin is not. This causes alignment problems. I'm not sure why the CygWin-to-CygWin case is having problems since the padding should be identical there but I can tell you how to fix the other case.

Using the code I gave earlier:

#include <stdio.h>
typedef struct {
    unsigned char _array[28];
    long long _sequence;
    unsigned char _type;
    unsigned char _num;
    short _size;
} tType;
int main (void) {
    tType t[2];
    printf ("%d\n", sizeof(long));
    printf ("%d\n", sizeof(long long));
    printf ("%d\n", sizeof(tType));
    printf ("%p\n", &(t[0]._array));
    printf ("%p\n", &(t[0]._sequence));
    printf ("%p\n", &(t[0]._num));
    printf ("%p\n", &(t[0]._size));
    printf ("%p\n", &(t[1]));
    return 0;
}

If you don't want any padding, you have two choices. The first is to re-organise your structure to put the more restrictive types up front:

typedef struct {
    long long _sequence;
    short _size;
    unsigned char _array[28];
    unsigned char _type;
    unsigned char _num;
} tType;

which gives you:

4
8
40
0x22cd42
0x22cd38
0x22cd5f
0x22cd40
0x22cd60

In other words, each structure is exactly 40 bytes (8 for sequence, 2 for size, 28 for array and 1 each for type and num). But this may not be possible if you want it in a specific order.

In that case, you can force the alignments to be on a byte level with:

typedef struct {
    unsigned char _array[28];
    long long _sequence;
    unsigned char _type;
    unsigned char _num;
    short _size;
} __attribute__ ((aligned(1),packed)) tType;

The aligned(1) sets it to byte alignment but that won't affect much since objects don't like having their alignments reduced. To force that, you need to use packed as well.

Doing that gives you:

4
8
40
0x22cd3c
0x22cd58
0x22cd61
0x22cd62
0x22cd64

Earlier history for prosperity:

Well, since I wget and ftp huge files just fine from CygWin, my psychic debugging skills tell me it's more likely to be a problem with your code rather than the CygWin software.

In other words, regarding the sentence "the packets are corrupted between the level Wireshark looks at and the program itself", I'd be seriously looking towards the upper end of that scale rather than the lower end :-)

Usually, it's the case that you've assumed a read will get the whole packet that was sent rather than bits at a time but, without seeing the code in question, that's a pretty wild guess.

Make sure you're checking the return value from read to see how many bytes are actually being received. Beyond that, post the code responsible for the read so we can give a more in-depth analysis.

Based on your posted code, it looks okay. The only thing I can suggest is that you check that the buffers you're passing in are big enough and, even if they are, make sure you print them immediately after return in case some other piece of code is corrupting the data.

In fact, in re-reading your question more closely, I'm a little confused. You state you have the same problem with your server code on both Linux and CygWin yet say it's working on Centos.

My only advice at this point is to put debugging printf statements in that function you've shown, such as after the select and read calls to output the relevant variables, including got and buf after changing them, and also in every code path so you can see what it's doing. And also dump the entire structure byte-for-byte at the sending end.

This will hopefully show you immediately where the problem lies, especially since you seem to have data showing up in the wrong place.

And make sure your types are compatible at both ends. By that, I mean if long long is different sizes on the two platforms, your data will be misaligned.

Okay, checking alignments at both ends, compile and run this program on both systems:

#include <stdio.h>
typedef struct {
    unsigned char _array[28];
    long long _sequence;
    unsigned char _type;
    unsigned char _num;
    short _size;
} tType;
int main (void) {
    tType t[2];
    printf ("%d\n", sizeof(long));
    printf ("%d\n", sizeof(long long));
    printf ("%d\n", sizeof(tType));
    printf ("%p\n", &(t[0]._array));
    printf ("%p\n", &(t[0]._sequence));
    printf ("%p\n", &(t[0]._num));
    printf ("%p\n", &(t[0]._size));
    printf ("%p\n", &(t[1]));
    return 0;
}

On my CygWin, I get:

4            long size
8            long long size
48           structure size
0x22cd30     _array start (size = 28, padded to 32)
0x22cd50     _sequence start (size = 8, padded to 9???)
0x22cd59     _type start (size = 1)
0x22cd5a     _size start (size = 2, padded to 6 for long long alignment).
0x22cd60     next array element.

The only odd bit there is the padding before _type but that's certainly valid though unexpected.

Check the output from Centos to see if it's incompatible. However, your statement that CygWin-to-CygWin doesn't work is incongruous with that possibility since the alinments and sizes would be compatible (unless your sending and receiving code is compiled differently).

回复收藏 0 原文

~没有更多了~