在 c++ 中合并十六进制字符串的最佳方法? [大量编辑]

发布于 2024-08-08 00:59:51 字数 1410 浏览 9 评论 0原文

我有两个带有掩码的十六进制字符串,我想将其合并为单个字符串值/掩码对。字符串可能有重叠的字节,但应用掩码后,任何重叠位都不应与该位的值相矛盾,即 value1 = 0x0A mask1 = 0xFE 和 value2 = 0x0B,mask2 = 0x0F 基本上表示生成的合并必须具有高半字节全部为“0”,低半字节必须为 01011

我已经使用直接 c 完成了此操作,将字符串转换为字节数组并将 memcpy 转换为缓冲区作为原型。它已经过测试并且似乎有效。然而,它丑陋且难以阅读,并且不会针对相矛盾的特定位要求抛出异常。我考虑过使用位集,但是是否有另一种可能不需要转换开销的方法?性能会很好,但并不重要。


编辑:更多细节,尽管写这篇文章让我意识到我把一个简单的问题变得太困难了。但是,无论如何,它就在这里。

我得到了大量的输入,这些输入是混合内容文档的二进制搜索。该文档被分成多个页面,并且页面由 API 提供,每次提供一个页面。每个页面都需要使用提供的搜索词进行搜索。

我在请求页面之前拥有所有搜索词。输入是表示十六进制数字的字符串(这就是我所说的十六进制字符串)以及指示输入十六进制字符串中重要位的掩码。由于我预先获得了所有输入,因此我想改进返回的每个页面的搜索。我想预处理将这些十六进制字符串合并在一起。为了使问题变得更有趣,每个字符串在它们必须出现的页面中都有一个可选的偏移量,并且缺少偏移量表明该字符串可以出现在所请求的页面中的任何位置。因此,类似这样:

class Input {
  public:
    int input_id;
    std::string value;
    std::string mask;
    bool offset_present;
    unsigned int offset;
};

如果给定的输入对象具有 offset_present = false,则分配给 offset 的任何值都将被忽略。如果 offset_present 为 false,那么它显然不能与其他输入合并。

为了使问题变得更有趣,我想报告一个输出,其中提供有关找到的内容的信息(找到的 input_id、偏移量在哪里等)。合并一些输入(但不合并其他输入)会使这变得更加困难。

我曾考虑定义一个 CompositeInput 类,并考虑将底层合并作为一个位集,但进一步阅读有关位集的内容让我意识到这不是我真正的想法。我的经验不足让我放弃了复合想法并采用了暴力。我必然跳过了有关其他输入类型的一些详细信息,以及找到输入时为输出收集的附加信息(例如页码、段落号)。下面是一个示例输出类:

class Output {
  public:
    Output();
    int id_result;
    unsigned int offset_result;
};

如果我合并 N 个十六进制字符串,我希望生成 N 个输出类,从而对用户隐藏任何合并详细信息。

I have two hex strings, accompanied by masks, that I would like to merge into a single string value/mask pair. The strings may have bytes that overlap but after applying masks, no overlapping bits should contradict what the value of that bit must be, i.e. value1 = 0x0A mask1 = 0xFE and value2 = 0x0B, mask2 = 0x0F basically says that the resulting merge must have the upper nibble be all '0's and the lower nibble must be 01011

I've done this already using straight c, converting strings to byte arrays and memcpy'ing into buffers as a prototype. It's tested and seems to work. However, it's ugly and hard to read and doesn't throw exceptions for specific bit requirements that contradict. I've considered using bitsets, but is there another way that might not demand the conversion overhead? Performance would be nice, but not crucial.


EDIT: More detail, although writing this makes me realize I've made a simple problem too difficult. But, here it is, anyway.

I am given a large number of inputs that are binary searches of a mixed-content document. The document is broken into pages, and pages are provided by an api the delivers a single page at a time. Each page needs to be searched with the provided search terms.

I have all the search terms prior to requesting pages. The input are strings representing hex digits (this is what I mean by hex strings) as well a mask to indicate bits that are significant in the input hex string. Since I'm given all input up-front I wanted to improve the search of each page returned. I wanted to pre-process merge these hex strings together. To make the problem more interesting, every string has an optional offset into the page where they must appear and a lack of an offset indicates that the string can appear anywhere in a page requested. So, something like this:

class Input {
  public:
    int input_id;
    std::string value;
    std::string mask;
    bool offset_present;
    unsigned int offset;
};

If a given Input object has offset_present = false, then any value assigned to offset is ignored. If offset_present is false, then it clearly can't be merged with other inputs.

To make the problem more interesting, I want to report an output that provides information about what was found (input_id that was found, where the offset was, etc). Merging some input (but not others) makes this a bit more difficult.

I had considered defining a CompositeInput class and was thinking about the underlying merger be a bitset, but further reading about about bitsets made me realize it wasn't what I really thought. My inexperience made me give up on the composite idea and go brute force. I necessarily skipped some details about other input types an additional information to be collected for the output (say, page number, parag. number) when an input is found. Here's an example output class:

class Output {
  public:
    Output();
    int id_result;
    unsigned int offset_result;
};

I would want to product N of these if I merge N hex strings, keeping any merger details hidden from the user.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

绝不放开 2024-08-15 00:59:51

我不知道十六进制字符串是什么......但除此之外它应该是这样的:

 outcome = (value1 & mask1) | (value2 & mask2);

I don't know what a hexstring is... but other than that it should be like this:

 outcome = (value1 & mask1) | (value2 & mask2);
苹果你个爱泡泡 2024-08-15 00:59:51

听起来像 |, &和〜会起作用吗?

it sounds like |, & and ~ would work?

断念 2024-08-15 00:59:51
const size_t prefix = 2; // "0x"
const size_t bytes  = 2;
const char* value1 = "0x0A";
const char* mask1  = "0xFE";
const char* value2 = "0x0B";
const char* mask2  = "0x0F";
char output[prefix + bytes + 1] = "0x";

uint8_t char2int[] = { /*zeroes until index '0'*/ 0,1,2,3,4,5,6,7,8,9 /*...*/ 10,11,12,13,14,15 };
char int2char[] = { '0', /*...*/ 'F' };

for (size_t ii = prefix; ii != prefix + bytes; ++ii)
{
    uint8_t result1 = char2int[value1[ii]] & char2int[mask1[ii]];
    uint8_t result2 = char2int[value2[ii]] & char2int[mask2[ii]];
    if (result1 & result2)
        throw invalid_argument("conflicting bits");
    output[ii] = int2char[result1 | result2];
}
const size_t prefix = 2; // "0x"
const size_t bytes  = 2;
const char* value1 = "0x0A";
const char* mask1  = "0xFE";
const char* value2 = "0x0B";
const char* mask2  = "0x0F";
char output[prefix + bytes + 1] = "0x";

uint8_t char2int[] = { /*zeroes until index '0'*/ 0,1,2,3,4,5,6,7,8,9 /*...*/ 10,11,12,13,14,15 };
char int2char[] = { '0', /*...*/ 'F' };

for (size_t ii = prefix; ii != prefix + bytes; ++ii)
{
    uint8_t result1 = char2int[value1[ii]] & char2int[mask1[ii]];
    uint8_t result2 = char2int[value2[ii]] & char2int[mask2[ii]];
    if (result1 & result2)
        throw invalid_argument("conflicting bits");
    output[ii] = int2char[result1 | result2];
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文