当前位置：文江博客话题详情

如何使用指针数组反转数据（解析二进制文件）

发布于 2024-09-07 03:43:18 字数 936 浏览 3 评论 0 原文

我正在使用规范解析二进制文件。该文件采用大端模式，因为它累积了流数据包。我必须反转数据包的长度，以便将它们“reinterpret_cast”为正确的变量类型。（我无法使用 net/inet.h 函数，因为数据包具有不同的长度）。

ifstream 类的 read() 方法将字节放入图表指针数组中。我尝试使用 a 手动进行还原，但我无法弄清楚如何传递“指针列表”以更改它们在数组中的位置。

如果有人知道更有效的方法，请告诉我（需要解析 8GB 数据）。

#include <iostream>
#include <fstream>

void reverse(char &array[]);

using namespace std;

int main ()
{
    char *a[5];
    *a[0]='a'; *a[1]='b'; *a[2]='c'; *a[3]='d'; *a[4]='e';

    reverse(a);

    int i=0;
    while(i<=4)
    {
        cout << *a[i] << endl;
        i++;
    }
    return 0;
}
void reverse(char &array[])
{
    int size = sizeof(array[])+1;
    //int size = 5;
    cout << "ARRAY SIZE: " << size << endl;

    char aux;
    for (int i=0;i<size/2;i++)
    {
            aux=array[i];
            array[i]=array[size-i-1];
            array[size-i-1]=aux;
    }
}

感谢大家的帮助！

原文

I am parsing a binary file using a specification. The file comes in big-endian mode because it has streamed packets accumulated. I have to reverse the length of the packets in order to "reinterpret_cast" them into the right variable type. (I am not able to use net/inet.h function because the packets has different lengths).

The read() method of the ifstream class puts the bytes inside an array of chart pointers. I tried to do the reversion by hand using a but I cannot figure out how to pass the "list of pointers" in order to change their position in the array.

If someone knows a more efficent way to do so, please let me know (8gb of data needs to be parse).

#include <iostream>
#include <fstream>

void reverse(char &array[]);

using namespace std;

int main ()
{
    char *a[5];
    *a[0]='a'; *a[1]='b'; *a[2]='c'; *a[3]='d'; *a[4]='e';

    reverse(a);

    int i=0;
    while(i<=4)
    {
        cout << *a[i] << endl;
        i++;
    }
    return 0;
}
void reverse(char &array[])
{
    int size = sizeof(array[])+1;
    //int size = 5;
    cout << "ARRAY SIZE: " << size << endl;

    char aux;
    for (int i=0;i<size/2;i++)
    {
            aux=array[i];
            array[i]=array[size-i-1];
            array[size-i-1]=aux;
    }
}

Thanks all of you for your help!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

喵星人汪星人 2024-09-14 03:43:18

不完全是。

该文件采用大端模式，因为它已累积流式数据包。我必须反转数据包的长度，以便将它们“reinterpret_cast”为正确的变量类型。

您需要在存储数据级别反转字节，而不是文件或数据包。

例如，如果一个文件存储一个结构体。

struct S {
  int i;
  double d;
  char c;
};

要读取该结构，您需要反转：

int: [4321]->[1234]  // sizeof(int) == 4, swap the order of 4 bytes
double: [87654321]->[12345678]  // sizeof(double) == 8, swap the order of 8 bytes
char: [1]->[1]  // sizeof(char) == 1, swap 1 byte (no swapping needed)

不是一次读取整个结构。

不幸的是，它并不像反转文件中的数据块或文件本身那么简单。您需要确切地知道正在存储什么数据类型，并反转其中的字节。

inet.h 中的函数正是用于此目的，因此我鼓励您使用它们。

那么，这就引出了 C 字符串。如果您将 C 字符串存储在文件中，是否需要交换它们的字节顺序？嗯，ac 字符串是 1 字节 char 的序列。您不需要交换 1 个字节的 char，因此您不需要交换 ac 字符串中的数据！

如果您确实想交换 6 个字节，则可以使用 std::reverse 函数：

char in[6] = get6bytes();
cout << in << endl;  // shows abcdef 
std::reverse(in, in+6);
cout << in << endl;  // shows fedcba

如果您要大规模（大量类型）执行此操作，那么您可能需要考虑编写一个代码生成器来生成这些字节交换函数（和文件读取函数），这并不太难，只要你能找到一个工具来解析c中的结构（我使用过gcc-xml 为此，或者可能 clang 会有所帮助）。

这使得序列化成为一个更难的问题。如果您有能力，您可能需要考虑使用 XML 或 Google 的协议缓冲区来为您解决这些问题。

Not quite.

The file comes in big-endian mode because it has streamed packets accumulated. I have to reverse the length of the packets in order to "reinterpret_cast" them into the right variable type.

You need to reverse the bytes on the level of stored data, not the file and not the packets.

For example, if a file stores a struct.

struct S {
  int i;
  double d;
  char c;
};

to read the struct you will need to reverse:

int: [4321]->[1234]  // sizeof(int) == 4, swap the order of 4 bytes
double: [87654321]->[12345678]  // sizeof(double) == 8, swap the order of 8 bytes
char: [1]->[1]  // sizeof(char) == 1, swap 1 byte (no swapping needed)

Not the entire struct at once.

Unfortunately, it's not as trivial as just reversing the block of data in the file, or the file itself. You need to know exactly what data type is being stored, and reverse the bytes in it.

The functions in inet.h are used for exactly this purpose, so I encourage you to use them.

So, that brings us to c strings. If you're storing c strings in a file, do you need to swap their endianness? Well, a c string is a sequence of 1 byte chars. You don't need to swap 1 byte chars, so you don't need to swap the data in a c string!

If you really want to swap 6 bytes, you can use the std::reverse function:

char in[6] = get6bytes();
cout << in << endl;  // shows abcdef 
std::reverse(in, in+6);
cout << in << endl;  // shows fedcba

If you're doing this on any large scale (a large amount of types), then you may want to consider writing a code generator that generates these byte swapping functions (and file reading functions), it's not too hard, as long as you can find a tool to parse the structs in c (I've used gcc-xml for this, or maybe clang would help).

This makes serialization a harder problem. If it's in your power, you may want to consider using XML or Google's protocol buffers to solve these problems for you.

回复收藏 0 原文

后eg是否自 2024-09-14 03:43:18

好吧，听完你的评论我明白你在追求什么了。因此，您需要更改 6 字节宽的字段的字节顺序。

我认为这篇文章应该对您有帮助这个问题关于SO，它展示了如何实现以不同的方式进行转换，最快的是按位实现。它没有显示六字节宽字段的实现，但可以轻松制定类似的解决方案。

我建议将长度字段复制为 64 位整数，然后实现自定义函数来交换相关的 6 个字节。在任何情况下摆脱或所有字符指针...；）

如果您在 VC++ 上编译，则有此函数： _byteswap_uint64。超过这个 uint64 高端的 6 个字节，调用这个函数和 hopla，你就完成了。

凌晨 4:12 编辑（我一定是对 stackoverflow 非常上瘾了）

#include <iostream>
#include <stdlib.h>

typedef unsigned char    byte;
typedef unsigned __int64 uint64_t; // uncomment if you are not on VC++

// in case you are not compiling with VC++ use this custom function
// It can swap data of any size. Adapted from:
// https://stackoverflow.com/questions/2182002/convert-big-endian-to-little-endian-in-c-without-using-provided-func/2182581#2182581
// see: http://en.wikipedia.org/wiki/XOR_swap_algorithm

void
swapBytes( void* v, size_t n )
{
   byte* in = (byte*) v;

   for( size_t lo=0, hi=n-1; hi>lo; ++lo, --hi )

      in[lo] ^= in[hi]
   ,  in[hi] ^= in[lo]
   ,  in[lo] ^= in[hi] ;
}

#define SWAP(x) swapBytes( &x, sizeof(x) );


int
main()
{
   // pointer to location of length field. 
   // You will have to read it from file to memory.
   byte length[6] = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06 };

   // ok, you have read it from file, now get it in an uint64_t
   uint64_t i = *( (uint64_t*)  length );

   i <<= 16; // zero two bytes and move everything to the high end.

   std::cout << std::hex << i                     << std::endl;
   std::cout << std::hex << _byteswap_uint64( i ) << std::endl;

   // generic swapping function
   SWAP( i ) 
   std::cout << std::hex << i                     << std::endl;

   std::cin.get();
   return 0;
}

// Outputs:
// 605040302010000
// 10203040506
// 10203040506

Ok, after your comment I understand what you are after. So you need to change endianness of a field that is 6 bytes wide.

I think this article should help you as well as this question on SO, it shows how to implement conversions in different ways, the fastest being a bitwise implementation. It shows no implementation for a six byte wide field, but an analogous solution can easily be made.

I suggest copying your length field in a 64bit integer and then implementing a custom function to swap the relevant 6 bytes. Get rid or all the char pointers in any case...;)

If you are compiling on VC++ there is this function: _byteswap_uint64. Past your 6 bytes in the high end of this uint64, call this function and hopla, you are done.

edit at 4:12 am (I must be getting very addicted to stackoverflow)

#include <iostream>
#include <stdlib.h>

typedef unsigned char    byte;
typedef unsigned __int64 uint64_t; // uncomment if you are not on VC++

// in case you are not compiling with VC++ use this custom function
// It can swap data of any size. Adapted from:
// https://stackoverflow.com/questions/2182002/convert-big-endian-to-little-endian-in-c-without-using-provided-func/2182581#2182581
// see: http://en.wikipedia.org/wiki/XOR_swap_algorithm

void
swapBytes( void* v, size_t n )
{
   byte* in = (byte*) v;

   for( size_t lo=0, hi=n-1; hi>lo; ++lo, --hi )

      in[lo] ^= in[hi]
   ,  in[hi] ^= in[lo]
   ,  in[lo] ^= in[hi] ;
}

#define SWAP(x) swapBytes( &x, sizeof(x) );


int
main()
{
   // pointer to location of length field. 
   // You will have to read it from file to memory.
   byte length[6] = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06 };

   // ok, you have read it from file, now get it in an uint64_t
   uint64_t i = *( (uint64_t*)  length );

   i <<= 16; // zero two bytes and move everything to the high end.

   std::cout << std::hex << i                     << std::endl;
   std::cout << std::hex << _byteswap_uint64( i ) << std::endl;

   // generic swapping function
   SWAP( i ) 
   std::cout << std::hex << i                     << std::endl;

   std::cin.get();
   return 0;
}

// Outputs:
// 605040302010000
// 10203040506
// 10203040506

回复收藏 0 原文

~没有更多了~