将双精度常数定义为十六进制?

发布于 2024-09-28 01:59:04 字数 466 浏览 4 评论 0原文

我希望将 1.0 以下最接近的数字作为浮点数。通过阅读维基百科关于 IEEE-754 的文章,我设法找到了答案1.0 的二进制表示形式为 3FF0000000000000,因此最接近的双精度值实际上是 0x3FEFFFFFFFFFFFFFF

我知道用这个二进制数据初始化双精度的唯一方法是:

double a;
*((unsigned*)(&a) + 1) = 0x3FEFFFFF;
*((unsigned*)(&a) + 0) = 0xFFFFFFFF;

这使用起来相当麻烦。

如果可能的话,是否有更好的方法来定义这个双精度数?

I would like to have the closest number below 1.0 as a floating point. By reading wikipedia's article on IEEE-754 I have managed to find out that the binary representation for 1.0 is 3FF0000000000000, so the closest double value is actually 0x3FEFFFFFFFFFFFFF.

The only way I know of to initialize a double with this binary data is this:

double a;
*((unsigned*)(&a) + 1) = 0x3FEFFFFF;
*((unsigned*)(&a) + 0) = 0xFFFFFFFF;

Which is rather cumbersome to use.

Is there any better way to define this double number, if possible as a constant?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

山田美奈子 2024-10-05 01:59:04

十六进制浮点和双精度确实存在。
语法为 0x1.(尾数)p(十进制指数)
在你的情况下,语法是

double x = 0x1.fffffffffffffp-1

Hexadecimal float and double literals do exist.
The syntax is 0x1.(mantissa)p(exponent in decimal)
In your case the syntax would be

double x = 0x1.fffffffffffffp-1
薄暮涼年 2024-10-05 01:59:04

它不安全,但类似于:

double a;
*(reinterpret_cast<uint64_t *>(&a)) = 0x3FEFFFFFFFFFFFFFL;

但是,这依赖于系统上浮点数的特定字节顺序,所以不要这样做!

相反,只需将 DBL_EPSILON 放入 (或如另一个答案中指出的那样,std::numeric_limits::epsilon())好好利用。

It's not safe, but something like:

double a;
*(reinterpret_cast<uint64_t *>(&a)) = 0x3FEFFFFFFFFFFFFFL;

However, this relies on a particular endianness of floating-point numbers on your system, so don't do this!

Instead, just put DBL_EPSILON in <cfloat> (or as pointed out in another answer, std::numeric_limits<double>::epsilon()) to good use.

兔姬 2024-10-05 01:59:04
#include <iostream>
#include <iomanip>
#include <limits>
using namespace std;

int main()
{
    double const    x   = 1.0 - numeric_limits< double >::epsilon();

    cout
        << setprecision( numeric_limits< double >::digits10 + 1 ) << fixed << x
        << endl;
}
#include <iostream>
#include <iomanip>
#include <limits>
using namespace std;

int main()
{
    double const    x   = 1.0 - numeric_limits< double >::epsilon();

    cout
        << setprecision( numeric_limits< double >::digits10 + 1 ) << fixed << x
        << endl;
}
≈。彩虹 2024-10-05 01:59:04

如果您制作 bit_cast 并使用 fixed- width 整数类型,它可以安全地完成:

template <typename R, typename T>
R bit_cast(const T& pValue)
{
    // static assert R and T are POD types

    // reinterpret_cast is implementation defined,
    // but likely does what you expect
    return reinterpret_cast<const R&>(pValue);
}

const uint64_t target = 0x3FEFFFFFFFFFFFFFL;
double result = bit_cast<double>(target);

虽然你可能只是 从中减去epsilon

If you make a bit_cast and use fixed-width integer types, it can be done safely:

template <typename R, typename T>
R bit_cast(const T& pValue)
{
    // static assert R and T are POD types

    // reinterpret_cast is implementation defined,
    // but likely does what you expect
    return reinterpret_cast<const R&>(pValue);
}

const uint64_t target = 0x3FEFFFFFFFFFFFFFL;
double result = bit_cast<double>(target);

Though you can probably just subtract epsilon from it.

悲欢浪云 2024-10-05 01:59:04

这有点过时,但您可以使用union
假设您的系统上 long longdouble 的长度都是 8 字节:

typedef union { long long a; double b } my_union;

int main()
{
    my_union c;
    c.b = 1.0;
    c.a--;
    std::cout << "Double value is " << c.b << std::endl;
    std::cout << "Long long value is " << c.a << std::endl;
}

这里您不需要提前知道 1.0 的位表示形式是什么。

It's a little archaic, but you can use a union.
Assuming a long long and a double are both 8 bytes long on your system:

typedef union { long long a; double b } my_union;

int main()
{
    my_union c;
    c.b = 1.0;
    c.a--;
    std::cout << "Double value is " << c.b << std::endl;
    std::cout << "Long long value is " << c.a << std::endl;
}

Here you don't need to know ahead of time what the bit representation of 1.0 is.

终难愈 2024-10-05 01:59:04

这个 0x1.fffffffffffffp-1 语法很棒,但仅限于 C99 或 C++17。

但有一个解决方法,没有(指针)转换,没有 UB/IB,只是简单的数学。

double x = (double)0x1fffffffffffff / (1LL << 53);

如果我需要一个 Pi,并且 Pi(double) 的十六进制为 0x1.921fb54442d18p1,只需编写

const double PI = (double)0x1921fb54442d18 / (1LL << 51);

如果您的常数有大或小指数,您可以使用函数 exp2 而不是移位,但是 exp2 是 C99/C++11 ...使用 pow 进行救援!

This 0x1.fffffffffffffp-1 syntax is great, but only in C99 or C++17.

But there is a workaround, no (pointer-)casting, no UB/IB, just simple math.

double x = (double)0x1fffffffffffff / (1LL << 53);

If I need a Pi, and Pi(double) is 0x1.921fb54442d18p1 in hex, just write

const double PI = (double)0x1921fb54442d18 / (1LL << 51);

If your constant has large or small exponent, you could use the function exp2 instead of the shift, but exp2 is C99/C++11 ... Use pow for rescue!

知你几分 2024-10-05 01:59:04

最直接的解决方案是使用 math.h 中的 nextafter(),而不是所有的位杂耍。因此:

#include <math.h>
double a = nextafter(1.0, 0.0); 

将其读作:1.0之后沿0.0方向的下一个浮点值;原始问题中“最接近的低于 1.0 的数字”的几乎直接编码。

Rather than all the bit juggling, the most direct solution is to use nextafter() from math.h. Thus:

#include <math.h>
double a = nextafter(1.0, 0.0); 

Read this as: the next floating-point value after 1.0 in the direction of 0.0; an almost direct encoding of "the closest number below 1.0" from the original question.

多情癖 2024-10-05 01:59:04

https://godbolt.org/z/MTY4v4exz

typedef union { long long a; double b; } my_union;

int main()
{
    my_union c;
    c.b = 1.0;
    c.a--;
    std::cout << "Double value is " << c.b << std::endl;
    std::cout << "Long long value is " << c.a << std::endl;
}

https://godbolt.org/z/MTY4v4exz

typedef union { long long a; double b; } my_union;

int main()
{
    my_union c;
    c.b = 1.0;
    c.a--;
    std::cout << "Double value is " << c.b << std::endl;
    std::cout << "Long long value is " << c.a << std::endl;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文