在 C/C++ 中高效地在十六进制、二进制和十进制之间进行转换

发布于 2024-07-18 10:19:00 字数 672 浏览 4 评论 0原文

我有 3 种正整数的基本表示形式:

  1. 十进制,无符号长变量(例如 unsigned long int NumDec = 200)。
  2. 十六进制,字符串变量(例如 string NumHex = "C8"
  3. 二进制,字符串变量(例如 string NumBin = "11001000"

我希望能够在之间进行转换以最有效的方式呈现所有 3 种表示形式中的数字。 即实现以下 6 个功能:

unsigned long int Binary2Dec(const string & Bin) {}
unsigned long int Hex2Dec(const string & Hex) {}
string Dec2Hex(unsigned long int Dec) {}
string Binary2Hex(const string & Bin) {}
string Dec2Binary(unsigned long int Dec) {}
string Hex2Binary(const string & Hex) {}

对每个功能最有效的方法是什么? 我可以使用 C 和 C++,但不能使用 boost。

编辑:“效率”是指时间效率:最短的执行时间。

I have 3 base representations for positive integer numbers:

  1. Decimal, in unsigned long variable (e.g. unsigned long int NumDec = 200).
  2. Hex, in string variable (e.g. string NumHex = "C8")
  3. Binary, in string variable (e.g. string NumBin = "11001000")

I want to be able to convert between numbers in all 3 representations in the most efficient way. I.e. to implement the following 6 functions:

unsigned long int Binary2Dec(const string & Bin) {}
unsigned long int Hex2Dec(const string & Hex) {}
string Dec2Hex(unsigned long int Dec) {}
string Binary2Hex(const string & Bin) {}
string Dec2Binary(unsigned long int Dec) {}
string Hex2Binary(const string & Hex) {}

What is the most efficient approach for each of them? I can use C and C++, but not boost.

Edit: By "efficiency" I mean time efficiency: Shortest execution time.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

森罗 2024-07-25 10:19:01

正如其他人指出的那样,我将从 sscanf()printf()< /a> 和/或 strtoul()。 它们对于大多数应用程序来说足够快,并且不太可能出现错误。 然而,我要说的是,这些函数比您想象的更通用,因为它们必须处理非 ASCII 字符集、以任何基数表示的数字等等。 对于某些领域,可以击败库函数。

因此,首先进行测量,如果这些转换的性能确实是一个问题,那么:

1)在某些应用程序/领域中,某些数字经常出现,例如零、100、200、19.95,可能非常常见,以至于有意义优化你的函数,用一堆 if() 语句转换这些数字,然后回退到通用库函数。
2) 如果最常见的 100 个数字,则使用表查找,然后依靠库函数。 请记住,大型表可能不适合您的缓存,并且可能需要共享库的多个间接访问,因此请仔细衡量这些内容,以确保不会降低性能。

您可能还想看看 boost lexical_cast 函数,尽管根据我的经验,后者与旧的 C 函数相比是相对而言的。

虽然很多人都说过,但值得一遍又一遍地重复:在有证据表明这些转换是一个问题之前,不要优化这些转换。 如果您确实进行了优化,请测量您的新实现以确保它更快并且确保您对自己的版本进行了大量的单元测试,因为您会引入错误:-(

As others have pointed out, I would start with sscanf(), printf() and/or strtoul(). They are fast enough for most applications, and they are less likely to have bugs. I will say, however, that these functions are more generic than you might expect, as they have to deal with non-ASCII character sets, with numbers represented in any base and so forth. For some domains it is possible to beat the library functions.

So, measure first, and if the performance of these conversion is really an issue, then:

1) In some applications / domains certain numbers appear very often, for example zero, 100, 200, 19.95, may be so common that it makes sense to optimize your functions to convert such numbers with a bunch of if() statements, and then fall back to the generic library functions.
2) Use a table lookup if the most common 100 numbers, and then fall back on a library function. Remember that large tables may not fit in your cache and may require multiple indirections for shared libraries, so measure these things carefully to make sure you are not decreasing performance.

You may also want to look at boost lexical_cast functions, though in my experience the latter are relatively compared to the good old C functions.

Tough many have said it, it is worth repeating over and over: do not optimize these conversions until you have evidence that they are a problem. If you do optimize, measure your new implementation to make sure it is faster and make sure you have a ton of unit tests for your own version, because you will introduce bugs :-(

巷雨优美回忆 2024-07-25 10:19:01

我建议只使用 sprintfsscanf

另外,如果您对它的实现方式感兴趣,可以查看源代码 对于 glibc,GNU C 库

I would suggest just using sprintf and sscanf.

Also, if you're interested in how it's implemented you can take a look at the source code for glibc, the GNU C Library.

冷清清 2024-07-25 10:19:01

为什么这些例程必须如此高效? 这种说法总是让我感到好奇。 您确定像 strtol() 这样明显的转换方法太慢,或者您可以做得更好吗? 系统功能通常非常高效。 它们有时在支持通用性和错误检查方面速度较慢,但​​您需要考虑如何处理错误。 如果 bin 参数包含“0”和“1”以外的字符,那么怎么办? 中止? 传播大量错误?

为什么用“Dec”来表示内部表示? 应使用 Dec、Hex 和 Bin 来引用字符串表示形式。 unsigned long 没有任何小数。 您正在处理显示十进制数字的字符串吗? 如果不是,你就会让这里的人们感到困惑,并且会让更多的人感到困惑。

使用查找表可以快速有效地完成二进制和十六进制文本格式之间的转换,但任何涉及十进制文本格式的转换都会更加复杂。

Why do these routines have to be so time-efficient? That sort of claim always makes me wonder. Are you sure the obvious conversion methods like strtol() are too slow, or that you can do better? System functions are usually pretty efficient. They are sometimes slower to support generality and error-checking, but you need to consider what to do with errors. If a bin argument has characters other than '0' and '1', what then? Abort? Propagate massive errors?

Why are you using "Dec" to represent the internal representation? Dec, Hex, and Bin should be used to refer to the string representations. There's nothing decimal about an unsigned long. Are you dealing with strings showing the number in decimal? If not, you're confusing people here and are going to confuse many more.

The transformation between binary and hex text formats can be done quickly and efficiently, with lookup tables, but anything involving decimal text format will be more complicated.

故笙诉离歌 2024-07-25 10:19:01

这取决于你要优化的目的,“高效”是什么意思? 转换速度快、占用内存少、程序员时间少、阅读代码的其他程序员的 WTF 更少,这些重要吗? , 或者是什么?

为了可读性和易于实现,您至少应该通过调用 strotul()。 这使得它们成为俏皮话,至少对于上述单词的某些解释来说,这是非常有效的。

That depends on what you're optimizing for, what do you mean by "efficient"? Is it important that the conversions be fast, use little memory, little programmer time, fewer WTFs from other programmers reading the code, or what?

For readability and ease of implementation, you should at least implement both Dec2Hex() and Dec2Binary() by just calling strotul(). That makes them into one-liners, which is very efficient for at least some of the above interpretations of the word.

爱人如己 2024-07-25 10:19:01

听起来很像一个家庭作业问题,但到底是什么......

简短的答案是使用两个查找表从 long int 转换为字符串。 每个表应有 256 个条目。 将一个字节映射到十六进制字符串:0 -> 0 “00”,1-> “01”等。另一个将字节映射为位串:0 -> 0。 “00000000”,1-> “00000001”。

然后,对于 long int 中的每个字节,您只需查找正确的字符串并将它们连接起来。

要将字符串转换回 long,只需将每个字符的数值乘以 16 或 2 的相应幂,然后将结果相加,即可将十六进制字符串和位字符串转换回十进制数。

编辑:您还可以通过进行二分搜索来使用相同的查找表进行向后转换以查找正确的字符串。 这需要对字符串进行 log(256) = 8 次比较。 不幸的是,我没有时间分析比较字符串是否比整数相乘和相加快得多。

Sounds very much like a homework problem, but what the heck...

The short answer is for converting from long int to your strings use two lookup tables. Each table should have 256 entries. One maps a byte to a hex string: 0 -> "00", 1 -> "01", etc. The other maps a byte to a bit string: 0 -> "00000000", 1 -> "00000001".

Then for each byte in your long int you just have to look up the correct string, and concatenate them.

To convert from strings back to long you can simply convert the hex string and the bit string back to a decimal number by multiplying the numeric value of each character by the appropriate power of 16 or 2, and summing up the results.

EDIT: You can also use the same lookup tables for backwards conversion by doing binary search to find the right string. This would take log(256) = 8 comparisons of your strings. Unfortunately I don't have time to do the analysis whether comparing strings would be much faster than multiplying and adding integers.

云之铃。 2024-07-25 10:19:01

让我们暂时考虑一下任务的一半 - 从字符串化的基数 n 转换为 unsigned long,其中 n 是 2 的幂(二进制基数为 2,十六进制基数为 16)。

如果你的输入是理智的,那么这项工作只不过是比较、减法、移位和或每个数字。 如果你的输入不理智,那么事情就会变得丑陋,不是吗? 超快地进行转换并不难。 在任何情况下都做好这件事是一项挑战。

因此,假设您的输入是理智的,那么转换的核心就是:

unsigned long PowerOfTwoFromString(char *input, int shift)
{
    unsigned long val = 0;
    char upperLimit = 'a' + (1 << shift)
    while (*input) {
        char c = tolower(*input++);
        unsigned long digit = (c > 'a' && c < upperLimit) ? c - 'a' + 10 : c - '0';
        val = (val << shift) | digit;
    }
    return val;
 }

 #define UlongFromBinaryString(str) PowerOfTwoFromString(str, 1)
 #define UlongFromHexString(str) PowerOfTwoFromString(str, 4)

看看这有多容易吗? 并且它会在不理智的输入上失败。 你的大部分工作都是为了让你的输入保持理智,而不是表现。

现在,该代码利用了两次移位的功能。 很容易扩展到基数4、基数8、基数32等。在两个基数无功率的情况下,它不会工作。 对于这些,你的数学必须改变。 您会得到

val = (val * base) + digit

这组操作在概念上是相同的。 乘以基数将等于移位。 所以我很可能会使用完全通用的例程。 并在清理输入的同时清理代码。 到那时,strtoul 可能是您最好的选择。 这是版本。 几乎所有的工作都是处理边缘条件——这应该会告诉你你的精力应该集中在哪里:正确的、有弹性的代码。 与不因错误输入而崩溃所节省的费用相比,使用位移位所节省的费用将是微乎其微的。

Let's think about half of task for a moment - converting from a string-ized base n to unsigned long, where n is a power of 2 (base 2 for binary and base 16 for hex).

If your input is sane, then this work is nothing more than a compare, a subract, a shift and an or per digit. If your input is not sane, well, that's where it gets ugly, doesn't it? Doing the conversion superfast is not hard. Doing it well under all circumstances is the challenge.

So let's assume that your input is sane, then the heart of your conversion is this:

unsigned long PowerOfTwoFromString(char *input, int shift)
{
    unsigned long val = 0;
    char upperLimit = 'a' + (1 << shift)
    while (*input) {
        char c = tolower(*input++);
        unsigned long digit = (c > 'a' && c < upperLimit) ? c - 'a' + 10 : c - '0';
        val = (val << shift) | digit;
    }
    return val;
 }

 #define UlongFromBinaryString(str) PowerOfTwoFromString(str, 1)
 #define UlongFromHexString(str) PowerOfTwoFromString(str, 4)

See how easy that is? And it will fail on non-sane inputs. Most of your work is going to go into making your input sane, not performance.

Now, this code takes advantage of power of two shifting. It's easy to extend to base 4, base 8, base 32, etc. It won't work on non-power of two bases. For those, your math has to change. You get

val = (val * base) + digit

which is conceptually the same for this set of operations. The multiplication by the base is going to be equivalent to the shift. So I'd be as likely to use a fully general routine instead. And sanitize the code while sanitizing the inputs. And at that point, strtoul is probably your best bet. Here's a link to a version of strtoul. Nearly all the work is handling edge conditions - that should clue you in on where you energies should be focused: correct, resilient code. The savings for using bit shifts is going to be minimal compared to the savings of say, not crashing on bad input.

七七 2024-07-25 10:19:01

为什么不直接使用宏来将格式作为输入。 如果你至少是C语言的话。

#define TO_STRING( string, format, data) \
sprintf( string, "##format##", data)
// Int
TO_STRING(buf,%d,i);
// Hex ( Two char representation )
TO_STRING(buf,%02x,i);
// Binary
TO_STRING(buf,%b,i);

或者你可以直接使用 sprintf: 或者你可以有多个宏。

#define INT_STRING( buf, data) \
sprintf( buf, "%d", data)
#define HEX_STRING( buf, data) \
sprintf( buf, "%x", data)
#define BIN_TO_STRING( buf, data) \
sprintf( buf, "%b", data)

BIN_TO_STRING( loc_buf, my_bin );

Why not just use a Macro to also take the format as an input. If you are in C at least.

#define TO_STRING( string, format, data) \
sprintf( string, "##format##", data)
// Int
TO_STRING(buf,%d,i);
// Hex ( Two char representation )
TO_STRING(buf,%02x,i);
// Binary
TO_STRING(buf,%b,i);

Or you can use sprintf directly: Or you can have multiple macroes.

#define INT_STRING( buf, data) \
sprintf( buf, "%d", data)
#define HEX_STRING( buf, data) \
sprintf( buf, "%x", data)
#define BIN_TO_STRING( buf, data) \
sprintf( buf, "%b", data)

BIN_TO_STRING( loc_buf, my_bin );
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文