将双精度常数定义为十六进制？

发布于 2024-09-28 01:59:04 字数 466 浏览 7 评论 0原文

我希望将 1.0 以下最接近的数字作为浮点数。通过阅读维基百科关于 IEEE-754 的文章，我设法找到了答案1.0 的二进制表示形式为 3FF0000000000000，因此最接近的双精度值实际上是 0x3FEFFFFFFFFFFFFFF。

我知道用这个二进制数据初始化双精度的唯一方法是：

double a;
*((unsigned*)(&a) + 1) = 0x3FEFFFFF;
*((unsigned*)(&a) + 0) = 0xFFFFFFFF;

这使用起来相当麻烦。

如果可能的话，是否有更好的方法来定义这个双精度数？

原文

I would like to have the closest number below 1.0 as a floating point. By reading wikipedia's article on IEEE-754 I have managed to find out that the binary representation for 1.0 is 3FF0000000000000, so the closest double value is actually 0x3FEFFFFFFFFFFFFF.

The only way I know of to initialize a double with this binary data is this:

double a;
*((unsigned*)(&a) + 1) = 0x3FEFFFFF;
*((unsigned*)(&a) + 0) = 0xFFFFFFFF;

Which is rather cumbersome to use.

Is there any better way to define this double number, if possible as a constant?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

山田美奈子 2024-10-05 01:59:04

十六进制浮点和双精度确实存在。
语法为 0x1.(尾数)p(十进制指数)
在你的情况下，语法是

double x = 0x1.fffffffffffffp-1

Hexadecimal float and double literals do exist.
The syntax is 0x1.(mantissa)p(exponent in decimal)
In your case the syntax would be

double x = 0x1.fffffffffffffp-1

回复收藏 0 原文

薄暮涼年 2024-10-05 01:59:04

它不安全，但类似于：

double a;
*(reinterpret_cast<uint64_t *>(&a)) = 0x3FEFFFFFFFFFFFFFL;

但是，这依赖于系统上浮点数的特定字节顺序，所以不要这样做！

相反，只需将 DBL_EPSILON 放入（或如另一个答案中指出的那样，std::numeric_limits::epsilon()）好好利用。

It's not safe, but something like:

double a;
*(reinterpret_cast<uint64_t *>(&a)) = 0x3FEFFFFFFFFFFFFFL;

However, this relies on a particular endianness of floating-point numbers on your system, so don't do this!

Instead, just put DBL_EPSILON in <cfloat> (or as pointed out in another answer, std::numeric_limits<double>::epsilon()) to good use.

回复收藏 0 原文

兔姬 2024-10-05 01:59:04

#include <iostream>
#include <iomanip>
#include <limits>
using namespace std;

int main()
{
    double const    x   = 1.0 - numeric_limits< double >::epsilon();

    cout
        << setprecision( numeric_limits< double >::digits10 + 1 ) << fixed << x
        << endl;
}

#include <iostream>
#include <iomanip>
#include <limits>
using namespace std;

int main()
{
    double const    x   = 1.0 - numeric_limits< double >::epsilon();

    cout
        << setprecision( numeric_limits< double >::digits10 + 1 ) << fixed << x
        << endl;
}

回复收藏 0 原文

≈。彩虹 2024-10-05 01:59:04

如果您制作 bit_cast 并使用 fixed- width 整数类型，它可以安全地完成：

template <typename R, typename T>
R bit_cast(const T& pValue)
{
    // static assert R and T are POD types

    // reinterpret_cast is implementation defined,
    // but likely does what you expect
    return reinterpret_cast<const R&>(pValue);
}

const uint64_t target = 0x3FEFFFFFFFFFFFFFL;
double result = bit_cast<double>(target);

虽然你可能只是从中减去epsilon。

If you make a bit_cast and use fixed-width integer types, it can be done safely:

template <typename R, typename T>
R bit_cast(const T& pValue)
{
    // static assert R and T are POD types

    // reinterpret_cast is implementation defined,
    // but likely does what you expect
    return reinterpret_cast<const R&>(pValue);
}

const uint64_t target = 0x3FEFFFFFFFFFFFFFL;
double result = bit_cast<double>(target);

Though you can probably just subtract epsilon from it.

回复收藏 0 原文

悲欢浪云 2024-10-05 01:59:04

这有点过时，但您可以使用union。
假设您的系统上 long long 和 double 的长度都是 8 字节：

typedef union { long long a; double b } my_union;

int main()
{
    my_union c;
    c.b = 1.0;
    c.a--;
    std::cout << "Double value is " << c.b << std::endl;
    std::cout << "Long long value is " << c.a << std::endl;
}

这里您不需要提前知道 1.0 的位表示形式是什么。

It's a little archaic, but you can use a union.
Assuming a long long and a double are both 8 bytes long on your system:

typedef union { long long a; double b } my_union;

int main()
{
    my_union c;
    c.b = 1.0;
    c.a--;
    std::cout << "Double value is " << c.b << std::endl;
    std::cout << "Long long value is " << c.a << std::endl;
}

Here you don't need to know ahead of time what the bit representation of 1.0 is.

回复收藏 0 原文

终难愈 2024-10-05 01:59:04

这个 0x1.fffffffffffffp-1 语法很棒，但仅限于 C99 或 C++17。

但有一个解决方法，没有（指针）转换，没有 UB/IB，只是简单的数学。

double x = (double)0x1fffffffffffff / (1LL << 53);

如果我需要一个 Pi，并且 Pi(double) 的十六进制为 0x1.921fb54442d18p1，只需编写

const double PI = (double)0x1921fb54442d18 / (1LL << 51);

如果您的常数有大或小指数，您可以使用函数 exp2 而不是移位，但是 exp2 是 C99/C++11 ...使用 pow 进行救援！

This 0x1.fffffffffffffp-1 syntax is great, but only in C99 or C++17.

But there is a workaround, no (pointer-)casting, no UB/IB, just simple math.

double x = (double)0x1fffffffffffff / (1LL << 53);

If I need a Pi, and Pi(double) is 0x1.921fb54442d18p1 in hex, just write

const double PI = (double)0x1921fb54442d18 / (1LL << 51);

If your constant has large or small exponent, you could use the function exp2 instead of the shift, but exp2 is C99/C++11 ... Use pow for rescue!

回复收藏 0 原文

知你几分 2024-10-05 01:59:04

最直接的解决方案是使用 math.h 中的 nextafter()，而不是所有的位杂耍。因此：

#include <math.h>
double a = nextafter(1.0, 0.0);

将其读作：1.0之后沿0.0方向的下一个浮点值；原始问题中“最接近的低于 1.0 的数字”的几乎直接编码。

Rather than all the bit juggling, the most direct solution is to use nextafter() from math.h. Thus:

#include <math.h>
double a = nextafter(1.0, 0.0);

Read this as: the next floating-point value after 1.0 in the direction of 0.0; an almost direct encoding of "the closest number below 1.0" from the original question.

回复收藏 0 原文

多情癖 2024-10-05 01:59:04

https://godbolt.org/z/MTY4v4exz

typedef union { long long a; double b; } my_union;

int main()
{
    my_union c;
    c.b = 1.0;
    c.a--;
    std::cout << "Double value is " << c.b << std::endl;
    std::cout << "Long long value is " << c.a << std::endl;
}

https://godbolt.org/z/MTY4v4exz

typedef union { long long a; double b; } my_union;

int main()
{
    my_union c;
    c.b = 1.0;
    c.a--;
    std::cout << "Double value is " << c.b << std::endl;
    std::cout << "Long long value is " << c.a << std::endl;
}

回复收藏 0 原文

~没有更多了~