C 如何计算 sin() 和其他数学函数?

发布于 2024-08-21 13:00:12 字数 287 浏览 9 评论 0 原文

我一直在研究 .NET 反汇编和 GCC 源代码,但似乎无法在任何地方找到 sin() 和其他数学函数的实际实现......它们似乎总是在引用其他的东西。

谁能帮我找到他们吗?我觉得运行 C 的所有硬件不太可能都支持硬件中的三角函数,所以一定有一个软件算法某处,对吗?


我知道计算函数的几种方法,并且为了好玩,我编写了自己的例程来使用泰勒级数计算函数。我很好奇真实的生产语言是如何做到这一点的,因为我的所有实现总是慢几个数量级,尽管我认为我的算法非常聪明(显然它们不是)。

I've been poring through .NET disassemblies and the GCC source code, but can't seem to find anywhere the actual implementation of sin() and other math functions... they always seem to be referencing something else.

Can anyone help me find them? I feel like it's unlikely that ALL hardware that C will run on supports trig functions in hardware, so there must be a software algorithm somewhere, right?


I'm aware of several ways that functions can be calculated, and have written my own routines to compute functions using taylor series for fun. I'm curious about how real, production languages do it, since all of my implementations are always several orders of magnitude slower, even though I think my algorithms are pretty clever (obviously they're not).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(22

疯了 2024-08-28 13:00:12

在 GNU libm 中,sin 的实现与系统相关。因此,您可以在 系统部门

其中一个目录包含由 IBM 提供的 C 实现。自 2011 年 10 月以来,这是在典型的 x86-64 Linux 系统上调用 sin() 时实际运行的代码。它显然比 fsin 汇编指令更快。源代码: sysdeps/ieee754/dbl-64/s_sin.c,查找__sin (double x)

这段代码非常复杂。没有一种软件算法能够在整个 x 值范围内尽可能快且准确,因此该库实现了几种不同的算法,其首要任务是查看 x > 并决定使用哪种算法。

  • x非常非常接近0时,sin(x) == x是正确的答案。

  • 再进一步,sin(x) 使用熟悉的泰勒级数。但是,这仅在 0 附近准确,因此...

  • 当角度大于约 7° 时,使用不同的算法,计算 sin(x) 和 cos(x) 的泰勒级数近似值,然后使用预先计算表中的值来细化近似值。

  • 当 |x| > 2、上述算法都不起作用,因此代码首先计算一些接近 0 的值,可以将其输入到 sincos 中。

  • 还有另一个分支来处理 x 为 NaN 或无穷大。

这段代码使用了一些我以前从未见过的数字技巧,尽管据我所知,它们可能在浮点专家中众所周知。有时几行代码需要好几段才能解释。例如,这两行

double t = (x * hpinv + toint);
double xn = t - toint;

(有时)用于将 x 减少到接近 0 的值,该值与 x 相差 π/2 的倍数,特别是 xn × π/2。在没有划分或分支的情况下完成此操作的方式相当聪明。但根本没有任何评论!


旧版 32 位版本的 GCC/glibc 使用 fsin 指令,该指令对于某些输入来说非常不准确。有一篇引人入胜的博客文章说明了只需 2 行代码即可实现

fdlibm 在纯 C 中的 sin 实现比 glibc 的简单得多,并且有很好的注释。源代码: fdlibm/s_sin.cfdlibm/k_sin.c

In GNU libm, the implementation of sin is system-dependent. Therefore you can find the implementation, for each platform, somewhere in the appropriate subdirectory of sysdeps.

One directory includes an implementation in C, contributed by IBM. Since October 2011, this is the code that actually runs when you call sin() on a typical x86-64 Linux system. It is apparently faster than the fsin assembly instruction. Source code: sysdeps/ieee754/dbl-64/s_sin.c, look for __sin (double x).

This code is very complex. No one software algorithm is as fast as possible and also accurate over the whole range of x values, so the library implements several different algorithms, and its first job is to look at x and decide which algorithm to use.

  • When x is very very close to 0, sin(x) == x is the right answer.

  • A bit further out, sin(x) uses the familiar Taylor series. However, this is only accurate near 0, so...

  • When the angle is more than about 7°, a different algorithm is used, computing Taylor-series approximations for both sin(x) and cos(x), then using values from a precomputed table to refine the approximation.

  • When |x| > 2, none of the above algorithms would work, so the code starts by computing some value closer to 0 that can be fed to sin or cos instead.

  • There's yet another branch to deal with x being a NaN or infinity.

This code uses some numerical hacks I've never seen before, though for all I know they might be well-known among floating-point experts. Sometimes a few lines of code would take several paragraphs to explain. For example, these two lines

double t = (x * hpinv + toint);
double xn = t - toint;

are used (sometimes) in reducing x to a value close to 0 that differs from x by a multiple of π/2, specifically xn × π/2. The way this is done without division or branching is rather clever. But there's no comment at all!


Older 32-bit versions of GCC/glibc used the fsin instruction, which is surprisingly inaccurate for some inputs. There's a fascinating blog post illustrating this with just 2 lines of code.

fdlibm's implementation of sin in pure C is much simpler than glibc's and is nicely commented. Source code: fdlibm/s_sin.c and fdlibm/k_sin.c

故笙诉离歌 2024-08-28 13:00:12

正弦和余弦等函数是在微处理器内部的微代码中实现的。例如,英特尔芯片有这些的汇编指令。 AC编译器将生成调用这些汇编指令的代码。 (相比之下,Java 编译器不会。Java 在软件而不是硬件中计算三角函数,因此运行速度要慢得多。)

芯片使用泰勒级数来计算三角函数,至少不完全使用泰勒级数。首先,他们使用 CORDIC,但他们也可能使用简短的泰勒级数来完善结果CORDIC 或特殊情况,例如以高相对精度计算非常小的角度的正弦。有关更多说明,请参阅此 StackOverflow 答案

Functions like sine and cosine are implemented in microcode inside microprocessors. Intel chips, for example, have assembly instructions for these. A C compiler will generate code that calls these assembly instructions. (By contrast, a Java compiler will not. Java evaluates trig functions in software rather than hardware, and so it runs much slower.)

Chips do not use Taylor series to compute trig functions, at least not entirely. First of all they use CORDIC, but they may also use a short Taylor series to polish up the result of CORDIC or for special cases such as computing sine with high relative accuracy for very small angles. For more explanation, see this StackOverflow answer.

め可乐爱微笑 2024-08-28 13:00:12

好吧,孩子们,专业人士的时间到了......
这是我对缺乏经验的软件工程师最大的抱怨之一。他们从头开始计算超越函数(使用泰勒级数),就好像以前没有人做过这些计算一样。不正确。这是一个定义明确的问题,非常聪明的软件和硬件工程师已经处理了数千次,并有一个定义明确的解决方案。
基本上,大多数超越函数都使用切比雪夫多项式来计算它们。至于使用哪些多项式取决于具体情况。首先,关于这个问题的圣经是哈特和切尼所著的一本名为《计算机近似》的书。在那本书中,您可以决定是否有硬件加法器、乘法器、除法器等,并确定哪些运算最快。例如,如果您有一个非常快的除法器,计算正弦的最快方法可能是 P1(x)/P2(x),其中 P1、P2 是切比雪夫多项式。如果没有快速除法器,它可能只是 P(x),其中 P 的项比 P1 或 P2 多得多......所以它会更慢。因此,第一步是确定您的硬件及其功能。然后选择切比雪夫多项式的适当组合(例如,对于余弦,通常采用 cos(ax) = aP(x) 的形式,同样,其中 P 是切比雪夫多项式)。然后你决定你想要什么小数精度。例如,如果你想要 7 位精度,你可以在我提到的书中相应的表格中查找,它会给你(精度 = 7.33)一个数字 N = 4 和一个多项式数 3502。N 是多项式(所以是 p4.x^4 + p3.x^3 + p2.x^2 + p1.x + p0),因为 N=4。然后你在书的后面的3502下查找p4,p3,p2,p1,p0值的实际值(它们将是浮点数)。然后,您可以在软件中以以下形式实现算法:
(((p4.x + p3).x + p2).x + p1).x + p0
....这就是在该硬件上计算余弦到小数点后 7 位的方法。

请注意,FPU 中超越操作的大多数硬件实现通常涉及一些微代码和类似这样的操作(取决于硬件)。
切比雪夫多项式用于大多数超越数,但不是全部。例如,首先使用查找表使用牛顿拉夫逊方法的双重迭代来求平方根更快。
同样,《计算机近似》一书会告诉你这一点。

如果您计划实现这些功能,我建议任何人都获得该书的副本。它确实是此类算法的圣经。
请注意,有很多计算这些值的替代方法,如cordics等,但这些往往最适合只需要低精度的特定算法。为了保证每次的精度,切比雪夫多项式是最佳选择。就像我说的,明确的问题。这个问题已经解决了 50 年……就是这样解决的。

话虽如此,现在有一些技术可以使用切比雪夫多项式来获得低次多项式的单精度结果(如上面的余弦示例)。然后,还有其他技术可以在值之间进行插值以提高准确性,而无需使用更大的多项式,例如“Gal 的精确表方法”。后一种技术就是引用 ACM 文献的帖子所指的。但最终,切比雪夫多项式是用来实现 90% 的目标的。

享受。

OK kiddies, time for the pros....
This is one of my biggest complaints with inexperienced software engineers. They come in calculating transcendental functions from scratch (using Taylor's series) as if nobody had ever done these calculations before in their lives. Not true. This is a well defined problem and has been approached thousands of times by very clever software and hardware engineers and has a well defined solution.
Basically, most of the transcendental functions use Chebyshev Polynomials to calculate them. As to which polynomials are used depends on the circumstances. First, the bible on this matter is a book called "Computer Approximations" by Hart and Cheney. In that book, you can decide if you have a hardware adder, multiplier, divider, etc, and decide which operations are fastest. e.g. If you had a really fast divider, the fastest way to calculate sine might be P1(x)/P2(x) where P1, P2 are Chebyshev polynomials. Without the fast divider, it might be just P(x), where P has much more terms than P1 or P2....so it'd be slower. So, first step is to determine your hardware and what it can do. Then you choose the appropriate combination of Chebyshev polynomials (is usually of the form cos(ax) = aP(x) for cosine for example, again where P is a Chebyshev polynomial). Then you decide what decimal precision you want. e.g. if you want 7 digits precision, you look that up in the appropriate table in the book I mentioned, and it will give you (for precision = 7.33) a number N = 4 and a polynomial number 3502. N is the order of the polynomial (so it's p4.x^4 + p3.x^3 + p2.x^2 + p1.x + p0), because N=4. Then you look up the actual value of the p4,p3,p2,p1,p0 values in the back of the book under 3502 (they'll be in floating point). Then you implement your algorithm in software in the form:
(((p4.x + p3).x + p2).x + p1).x + p0
....and this is how you'd calculate cosine to 7 decimal places on that hardware.

Note that most hardware implementations of transcendental operations in an FPU usually involve some microcode and operations like this (depends on the hardware).
Chebyshev polynomials are used for most transcendentals but not all. e.g. Square root is faster to use a double iteration of Newton raphson method using a lookup table first.
Again, that book "Computer Approximations" will tell you that.

If you plan on implmementing these functions, I'd recommend to anyone that they get a copy of that book. It really is the bible for these kinds of algorithms.
Note that there are bunches of alternative means for calculating these values like cordics, etc, but these tend to be best for specific algorithms where you only need low precision. To guarantee the precision every time, the chebyshev polynomials are the way to go. Like I said, well defined problem. Has been solved for 50 years now.....and thats how it's done.

Now, that being said, there are techniques whereby the Chebyshev polynomials can be used to get a single precision result with a low degree polynomial (like the example for cosine above). Then, there are other techniques to interpolate between values to increase the accuracy without having to go to a much larger polynomial, such as "Gal's Accurate Tables Method". This latter technique is what the post referring to the ACM literature is referring to. But ultimately, the Chebyshev Polynomials are what are used to get 90% of the way there.

Enjoy.

浅笑轻吟梦一曲 2024-08-28 13:00:12

具体来说,对于 sin,使用泰勒展开式将得到:

sin(x) := x - x^3/3! + x^5/5! - x^7/7! + ... (1)

您将继续添加项,直到它们之间的差异低于可接受的容差水平或仅达到有限数量的步骤(更快,但不太精确)。一个例子如下:

float sin(float x)
{
  float res=0, pow=x, fact=1;
  for(int i=0; i<5; ++i)
  {
    res+=pow/fact;
    pow*=-1*x*x;
    fact*=(2*(i+1))*(2*(i+1)+1);
  }

  return res;
}

注意:(1) 之所以有效,是因为小角度的近似值 sin(x)=x。对于更大的角度,您需要计算越来越多的项才能获得可接受的结果。
您可以使用 while 参数并继续以获得一定的准确性:

double sin (double x){
    int i = 1;
    double cur = x;
    double acc = 1;
    double fact= 1;
    double pow = x;
    while (fabs(acc) > .00000001 &&   i < 100){
        fact *= ((2*i)*(2*i+1));
        pow *= -1 * x*x; 
        acc =  pow / fact;
        cur += acc;
        i++;
    }
    return cur;

}

For sin specifically, using Taylor expansion would give you:

sin(x) := x - x^3/3! + x^5/5! - x^7/7! + ... (1)

you would keep adding terms until either the difference between them is lower than an accepted tolerance level or just for a finite amount of steps (faster, but less precise). An example would be something like:

float sin(float x)
{
  float res=0, pow=x, fact=1;
  for(int i=0; i<5; ++i)
  {
    res+=pow/fact;
    pow*=-1*x*x;
    fact*=(2*(i+1))*(2*(i+1)+1);
  }

  return res;
}

Note: (1) works because of the aproximation sin(x)=x for small angles. For bigger angles you need to calculate more and more terms to get acceptable results.
You can use a while argument and continue for a certain accuracy:

double sin (double x){
    int i = 1;
    double cur = x;
    double acc = 1;
    double fact= 1;
    double pow = x;
    while (fabs(acc) > .00000001 &&   i < 100){
        fact *= ((2*i)*(2*i+1));
        pow *= -1 * x*x; 
        acc =  pow / fact;
        cur += acc;
        i++;
    }
    return cur;

}
陈年往事 2024-08-28 13:00:12

关于三角函数,如 sin()cos()tan(),5 年后还没有提及一个重要方面高质量三角函数:范围缩小

这些函数中的任何一个的早期步骤都是将角度(以弧度为单位)减小到 2*π 区间的范围。但 π 是无理数,因此像 x = approximation(x, 2*M_PI) 这样的简单归约会引入错误,因为 M_PI 或机器 pi 是 π 的近似值。那么,怎么做x=remaining(x, 2*π)呢?

早期的库使用扩展精度或精心设计的编程来提供高质量的结果,但仍然超出了double的有限范围。当请求像 sin(pow(2,30)) 这样的大值时,结果毫无意义或 0.0 并且可能带有 错误标志 设置为类似 TLOSS 完全损失精度或 PLOSS 部分精度损失。

将大值有效范围缩小到 -π 到 π 等区间是一个具有挑战性的问题,可以与基本三角函数(如 sin())本身的挑战相媲美。

一份很好的报告是针对巨大争论的争论减少:对最后一点(1992)。它很好地涵盖了这个问题:讨论了需求以及各种平台(SPARC、PC、HP、30 多个其他平台)上的情况,并提供了一种解决方案算法,可为所有 double< /code> 从 -DBL_MAXDBL_MAX


如果原始参数以度为单位,但可能具有很大的值,请首先使用 fmod() 来提高精度。一个好的 fmod() 将引入无错误,因此提供出色的范围缩小。

// sin(degrees2radians(x))
sin(degrees2radians(fmod(x, 360.0))); // -360.0 < fmod(x,360) < +360.0

各种三角恒等式和 remquo() 提供了更多改进。示例:sind()


[编辑 2024]

注意:经过审查,范围缩小,如 大量参数的参数减少:直到最后一点都很好可能不足以< code>tan() 靠近其极点。需要回顾一下。

Concerning trigonometric function like sin(), cos(),tan() there has been no mention, after 5 years, of an important aspect of high quality trig functions: Range reduction.

An early step in any of these functions is to reduce the angle, in radians, to a range of a 2*π interval. But π is irrational so simple reductions like x = remainder(x, 2*M_PI) introduce error as M_PI, or machine pi, is an approximation of π. So, how to do x = remainder(x, 2*π)?

Early libraries used extended precision or crafted programming to give quality results but still over a limited range of double. When a large value was requested like sin(pow(2,30)), the results were meaningless or 0.0 and maybe with an error flag set to something like TLOSS total loss of precision or PLOSS partial loss of precision.

Good range reduction of large values to an interval like -π to π is a challenging problem that rivals the challenges of the basic trig function, like sin(), itself.

A good report is Argument reduction for huge arguments: Good to the last bit (1992). It covers the issue well: discusses the need and how things were on various platforms (SPARC, PC, HP, 30+ other) and provides a solution algorithm the gives quality results for all double from -DBL_MAX to DBL_MAX.


If the original arguments are in degrees, yet may be of a large value, use fmod() first for improved precision. A good fmod() will introduce no error and so provide excellent range reduction.

// sin(degrees2radians(x))
sin(degrees2radians(fmod(x, 360.0))); // -360.0 < fmod(x,360) < +360.0

Various trig identities and remquo() offer even more improvement. Sample: sind()


[Edit 2024]

Note: On review, range reduction as demonstrated in Argument reduction for huge arguments: Good to the last bit may be insufficient for tan() near its poles. Something to review.

柠栀 2024-08-28 13:00:12

是的,也有计算 sin 的软件算法。基本上,用数字计算机计算这些东西通常是使用数值方法来完成的,例如近似< a href="http://en.wikipedia.org/wiki/Taylor_series" rel="noreferrer">泰勒级数 代表该函数。

数值方法可以将函数近似到任意精度,并且由于浮点数的精度是有限的,因此它们非常适合这些任务。

Yes, there are software algorithms for calculating sin too. Basically, calculating these kind of stuff with a digital computer is usually done using numerical methods like approximating the Taylor series representing the function.

Numerical methods can approximate functions to an arbitrary amount of accuracy and since the amount of accuracy you have in a floating number is finite, they suit these tasks pretty well.

半山落雨半山空 2024-08-28 13:00:12

使用 泰勒级数 并尝试找到级数项之间的关系,这样您就不必计算东西 这是一个余弦

的例子:

double cosinus(double x, double prec)
{
    double t, s ;
    int p;
    p = 0;
    s = 1.0;
    t = 1.0;
    while(fabs(t/s) > prec)
    {
        p++;
        t = (-t * x * x) / ((2 * p - 1) * (2 * p));
        s += t;
    }
    return s;
}

使用这个我们可以使用已经使用过的项来获得总和的新项(我们避免阶乘和 x2p

explanation

Use Taylor series and try to find relation between terms of the series so you don't calculate things again and again

Here is an example for cosinus:

double cosinus(double x, double prec)
{
    double t, s ;
    int p;
    p = 0;
    s = 1.0;
    t = 1.0;
    while(fabs(t/s) > prec)
    {
        p++;
        t = (-t * x * x) / ((2 * p - 1) * (2 * p));
        s += t;
    }
    return s;
}

using this we can get the new term of the sum using the already used one (we avoid the factorial and x2p)

explanation

决绝 2024-08-28 13:00:12

这是一个复杂的问题。 x86 系列的类似 Intel 的 CPU 有 sin() 函数的硬件实现,但它是 x87 FPU 的一部分,在 64 位模式下不再使用(其中使用 SSE2 寄存器) )。在该模式下,使用软件实现。

有几个这样的实现。一个位于 fdlibm 中,在 Java 中使用。据我所知,glibc 实现包含 fdlibm 的一部分,以及 IBM 贡献的其他部分。

超越函数(例如 sin())的软件实现通常使用多项式的近似值,通常从泰勒级数获得。

It is a complex question. Intel-like CPU of the x86 family have a hardware implementation of the sin() function, but it is part of the x87 FPU and not used anymore in 64-bit mode (where SSE2 registers are used instead). In that mode, a software implementation is used.

There are several such implementations out there. One is in fdlibm and is used in Java. As far as I know, the glibc implementation contains parts of fdlibm, and other parts contributed by IBM.

Software implementations of transcendental functions such as sin() typically use approximations by polynomials, often obtained from Taylor series.

烟火散人牵绊 2024-08-28 13:00:12

正如另一个答案中提到的,切比雪夫多项式是函数与多项式之间的最大差异尽可能小的多项式。这是一个极好的开始。

在某些情况下,最大误差不是您感兴趣的,而是最大相对误差。例如,对于正弦函数,x = 0 附近的误差应该比较大值小得多;你想要一个小的相对误差。因此,您可以计算 sin x / x 的切比雪夫多项式,并将该多项式乘以 x。

接下来你必须弄清楚如何评估多项式。您希望以中间值较小且因此舍入误差较小的方式对其进行评估。否则,舍入误差可能会比多项式中的误差大得多。对于像正弦函数这样的函数,如果您不小心,那么即使 x < ,您计算的 sin x 的结果也可能大于 sin y 的结果。 y。因此需要仔细选择计算顺序并计算舍入误差的上限。

例如,sin x = x - x^3/6 + x^5 / 120 - x^7 / 5040... 如果你天真地计算 sin x = x * (1 - x^2/6 + x^4/ 120 - x^6/5040...),那么括号中的函数是递减的,如果 y 是 x 的下一个更大的数,那么有时 sin y 会小于罪x。相反,请计算 sin x = x - x^3 * (1/6 - x^2 / 120 + x^4/5040...),这种情况不会发生。

例如,在计算切比雪夫多项式时,您通常需要将系数舍入为双精度。但是,虽然切比雪夫多项式是最优的,但系数四舍五入到双精度的切比雪夫多项式并不是具有双精度系数的最优多项式!

例如,对于 sin (x),您需要 x、x^3、x^5、x^7 等的系数。您可以执行以下操作: 使用多项式 (ax + bx^3 + cx^5 + dx^7) 的精度高于双精度,然后将 a 舍入为双精度,得到 A。a 和 A 之间的差异会很大。现在用多项式 (bx^3 + cx^5 + dx^7) 计算 (sin x - Ax) 的最佳近似值。您会得到不同的系数,因为它们适应 a 和 A 之间的差异。将 b 舍入为双精度 B。然后用多项式 cx^5 + dx^7 等近似 (sin x - Ax - Bx^3) 等。您将得到一个几乎与原始切比雪夫多项式一样好的多项式,但比四舍五入到双精度的切比雪夫要好得多。

接下来,您应该在选择多项式时考虑舍入误差。您在忽略舍入误差的多项式中找到了误差最小的多项式,但您想要优化多项式加舍入误差。获得切比雪夫多项式后,您就可以计算舍入误差的界限。假设 f (x) 是您的函数,P (x) 是多项式,E (x) 是舍入误差。你不想优化 | f(x) - P(x)|,你要优化| f (x) - P (x) +/- E (x) |。您将得到一个稍微不同的多项式,它试图在舍入误差较大的情况下降低多项式误差,并在舍入误差较小的情况下稍微放宽多项式误差。

所有这些将使您轻松地舍入误差最多为最后一位的 0.55 倍,其中 +、-、*、/ 的舍入误差最多为最后一位的 0.50 倍。

Chebyshev polynomials, as mentioned in another answer, are the polynomials where the largest difference between the function and the polynomial is as small as possible. That is an excellent start.

In some cases, the maximum error is not what you are interested in, but the maximum relative error. For example for the sine function, the error near x = 0 should be much smaller than for larger values; you want a small relative error. So you would calculate the Chebyshev polynomial for sin x / x, and multiply that polynomial by x.

Next you have to figure out how to evaluate the polynomial. You want to evaluate it in such a way that the intermediate values are small and therefore rounding errors are small. Otherwise the rounding errors might become a lot larger than errors in the polynomial. And with functions like the sine function, if you are careless then it may be possible that the result that you calculate for sin x is greater than the result for sin y even when x < y. So careful choice of the calculation order and calculation of upper bounds for the rounding error are needed.

For example, sin x = x - x^3/6 + x^5 / 120 - x^7 / 5040... If you calculate naively sin x = x * (1 - x^2/6 + x^4/120 - x^6/5040...), then that function in parentheses is decreasing, and it will happen that if y is the next larger number to x, then sometimes sin y will be smaller than sin x. Instead, calculate sin x = x - x^3 * (1/6 - x^2 / 120 + x^4/5040...) where this cannot happen.

When calculating Chebyshev polynomials, you usually need to round the coefficients to double precision, for example. But while a Chebyshev polynomial is optimal, the Chebyshev polynomial with coefficients rounded to double precision is not the optimal polynomial with double precision coefficients!

For example for sin (x), where you need coefficients for x, x^3, x^5, x^7 etc. you do the following: Calculate the best approximation of sin x with a polynomial (ax + bx^3 + cx^5 + dx^7) with higher than double precision, then round a to double precision, giving A. The difference between a and A would be quite large. Now calculate the best approximation of (sin x - Ax) with a polynomial (b x^3 + cx^5 + dx^7). You get different coefficients, because they adapt to the difference between a and A. Round b to double precision B. Then approximate (sin x - Ax - Bx^3) with a polynomial cx^5 + dx^7 and so on. You will get a polynomial that is almost as good as the original Chebyshev polynomial, but much better than Chebyshev rounded to double precision.

Next you should take into account the rounding errors in the choice of polynomial. You found a polynomial with minimum error in the polynomial ignoring rounding error, but you want to optimise polynomial plus rounding error. Once you have the Chebyshev polynomial, you can calculate bounds for the rounding error. Say f (x) is your function, P (x) is the polynomial, and E (x) is the rounding error. You don't want to optimise | f (x) - P (x) |, you want to optimise | f (x) - P (x) +/- E (x) |. You will get a slightly different polynomial that tries to keep the polynomial errors down where the rounding error is large, and relaxes the polynomial errors a bit where the rounding error is small.

All this will get you easily rounding errors of at most 0.55 times the last bit, where +,-,*,/ have rounding errors of at most 0.50 times the last bit.

余生一个溪 2024-08-28 13:00:12

没有什么比找到源头并看看有人如何在常用的库中实际完成它更好的了;让我们特别看一下一个 C 库的实现。我选择了uLibC。

这是 sin 函数:

http://git.uclibc.org/uClibc/tree/ libm/s_sin.c

看起来它处理了一些特殊情况,然后执行一些参数缩减以将输入映射到范围 [-pi/4,pi/4],(将参数分成两个在调用

http://git.uclibc 之前。 org/uClibc/tree/libm/k_sin.c

然后对这两个部分进行操作。
如果没有尾部,则使用 13 次多项式生成近似答案。
如果有尾部,您会根据 sin(x+y) = sin(x) + sin'(x')y 的原理得到一个小的修正加法

There's nothing like hitting the source and seeing how someone has actually done it in a library in common use; let's look at one C library implementation in particular. I chose uLibC.

Here's the sin function:

http://git.uclibc.org/uClibc/tree/libm/s_sin.c

which looks like it handles a few special cases, and then carries out some argument reduction to map the input to the range [-pi/4,pi/4], (splitting the argument into two parts, a big part and a tail) before calling

http://git.uclibc.org/uClibc/tree/libm/k_sin.c

which then operates on those two parts.
If there is no tail, an approximate answer is generated using a polynomial of degree 13.
If there is a tail, you get a small corrective addition based on the principle that sin(x+y) = sin(x) + sin'(x')y

牛↙奶布丁 2024-08-28 13:00:12

它们通常以软件实现,并且在大多数情况下不会使用相应的硬件(即,asemly)调用。然而,正如 Jason 指出的,这些都是特定于实现的。

请注意,这些软件例程不是编译器源代码的一部分,而是可以在相应的库中找到,例如 GNU 编译器的 clib 或 glibc。请参阅 http://www.gnu.org/software/ libc/manual/html_mono/libc.html#Trig-Functions

如果您想要更好的控制,您应该仔细评估您到底需要什么。一些典型的方法是查找表插值、汇编调用(通常很慢)或其他近似方案,例如求平方根的牛顿-拉夫森(Newton-Raphson)。

They are typically implemented in software and will not use the corresponding hardware (that is, aseembly) calls in most cases. However, as Jason pointed out, these are implementation specific.

Note that these software routines are not part of the compiler sources, but will rather be found in the correspoding library such as the clib, or glibc for the GNU compiler. See http://www.gnu.org/software/libc/manual/html_mono/libc.html#Trig-Functions

If you want greater control, you should carefully evaluate what you need exactly. Some of the typical methods are interpolation of look-up tables, the assembly call (which is often slow), or other approximation schemes such as Newton-Raphson for square roots.

迷途知返 2024-08-28 13:00:12

库函数的实际实现取决于特定的编译器和/或库提供者。无论是用硬件还是软件完成,是否是泰勒展开式等等,都会有所不同。

我意识到这绝对没有帮助。

The actual implementation of library functions is up to the specific compiler and/or library provider. Whether it's done in hardware or software, whether it's a Taylor expansion or not, etc., will vary.

I realize that's absolutely no help.

转角预定愛 2024-08-28 13:00:12

如果您想要在软件而不是硬件中实现,则可以在 Numerical Recipes 的第 5 章中寻找此问题的明确答案。 。我的副本在一个盒子里,所以我无法提供详细信息,但简短的版本(如果我没记错的话)是,您将 tan(theta/2) 作为原始操作并计算其他人来自那里。计算是通过级数近似完成的,但它的收敛速度比泰勒级数快得多。

抱歉,如果我没有把手放在书上,我就记不起更多了。

If you want an implementation in software, not hardware, the place to look for a definitive answer to this question is Chapter 5 of Numerical Recipes. My copy is in a box, so I can't give details, but the short version (if I remember this right) is that you take tan(theta/2) as your primitive operation and compute the others from there. The computation is done with a series approximation, but it's something that converges much more quickly than a Taylor series.

Sorry I can't rembember more without getting my hand on the book.

愛上了 2024-08-28 13:00:12

正如许多人指出的那样,它取决于实现。但据我了解你的问题,你对数学函数的真正软件实现感兴趣,但只是没有找到一个。如果是这种情况,那么您在这里:

您还可以查看具有 .tbl 扩展名的文件,它们的内容只不过是预计算<的巨大表格/em> 二进制形式的不同函数的值。这就是实现速度如此之快的原因:他们只进行快速查找,而不是计算他们使用的任何系列的所有系数,这快得多。顺便说一句,他们确实使用 Tailor 级数来计算正弦和余弦。

我希望这有帮助。

As many people pointed out, it is implementation dependent. But as far as I understand your question, you were interested in a real software implemetnation of math functions, but just didn't manage to find one. If this is the case then here you are:

  • Download glibc source code from http://ftp.gnu.org/gnu/glibc/
  • Look at file dosincos.c located in unpacked glibc root\sysdeps\ieee754\dbl-64 folder
  • Similarly you can find implementations of the rest of the math library, just look for the file with appropriate name

You may also have a look at the files with the .tbl extension, their contents is nothing more than huge tables of precomputed values of different functions in a binary form. That is why the implementation is so fast: instead of computing all the coefficients of whatever series they use they just do a quick lookup, which is much faster. BTW, they do use Tailor series to calculate sine and cosine.

I hope this helps.

凉风有信 2024-08-28 13:00:12

每当评估这样的函数时,在某种程度上很可能存在:

  • 插值的值表(用于快速、不准确的应用 - 例如计算机图形)
  • 收敛到所需值的一系列评估 --- 可能不是泰勒级数,更可能是基于像克伦肖-柯蒂斯这样的奇特求积法。

如果没有硬件支持,那么编译器可能会使用后一种方法,仅发出汇编代码(没有调试符号),而不是使用 ac 库——这使得您很难在调试器中跟踪实际代码。

Whenever such a function is evaluated, then at some level there is most likely either:

  • A table of values which is interpolated (for fast, inaccurate applications - e.g. computer graphics)
  • The evaluation of a series that converges to the desired value --- probably not a taylor series, more likely something based on a fancy quadrature like Clenshaw-Curtis.

If there is no hardware support then the compiler probably uses the latter method, emitting only assembler code (with no debug symbols), rather than using a c library --- making it tricky for you to track the actual code down in your debugger.

千纸鹤 2024-08-28 13:00:12

如果您想了解这些功能在 C 中的实际 GNU 实现,请查看 glibc 的最新主干。请参阅 GNU C 库

If you want to look at the actual GNU implementation of those functions in C, check out the latest trunk of glibc. See the GNU C Library.

风苍溪 2024-08-28 13:00:12

我将尝试回答 C 程序中 sin() 的情况,该程序是在当前 x86 处理器(假设是 Intel Core 2 Duo)上使用 GCC 的 C 编译器编译的。

在 C 语言中,标准 C 库包含常见的数学函数,但语言本身并未包含这些函数(例如,表示幂的 powsincos,分别为正弦和余弦)。其标头包含在 math.h 中。

现在在GNU/Linux系统上,这些库函数由glibc(GNU libc或GNU C Library)提供。但是 GCC 编译器希望您链接到 数学库 (libm.so) 使用 -lm 编译器标志来启用这些数学函数。 我不确定为什么它不是标准 C 库的一部分。这些将是浮点函数的软件版本,或“软浮点”。

旁白:将数学函数分开的原因是历史性的,只是为了减少非常旧的 Unix 系统中可执行程序的大小,可能是在共享库可用之前, 据我所知。

现在,编译器可以优化标准 C 库函数 sin()(由 libm.so 提供),以替换为对 CPU/FPU 内置指令的调用。 -in sin() 函数,它作为 FPU 指令(x86/x87 的 FSIN)存在于较新的处理器(如 Core 2 系列)上(这几乎可以追溯到 i486DX)。这将取决于传递给 gcc 编译器的优化标志。如果编译器被告知编写可以在任何 i386 或更新的处理器上执行的代码,它就不会进行这样的优化。 -mcpu=486 标志将通知编译器进行此类优化是安全的。

现在,如果程序执行 sin() 函数的软件版本,它将基于 CORDIC(坐标旋转数字计算机)或BKM 算法,或可能是现在常用来计算此类超越函数的表格或幂级数计算。 [来源:http://en.wikipedia.org/wiki/Cordic#Application]

任何最新(大约 2.9 倍)版本的 gcc 还提供内置版本的 sin,__builtin_sin(),它将用于替换对 C 库版本的标准调用,如下所示一个优化。

我确信这就像泥巴一样清晰,但希望能为您提供比您预期更多的信息,以及许多让您自己了解更多信息的起点。

I'll try to answer for the case of sin() in a C program, compiled with GCC's C compiler on a current x86 processor (let's say a Intel Core 2 Duo).

In the C language the Standard C Library includes common math functions, not included in the language itself (e.g. pow, sin and cos for power, sine, and cosine respectively). The headers of which are included in math.h.

Now on a GNU/Linux system, these libraries functions are provided by glibc (GNU libc or GNU C Library). But the GCC compiler wants you to link to the math library (libm.so) using the -lm compiler flag to enable usage of these math functions. I'm not sure why it isn't part of the standard C library. These would be a software version of the floating point functions, or "soft-float".

Aside: The reason for having the math functions separate is historic, and was merely intended to reduce the size of executable programs in very old Unix systems, possibly before shared libraries were available, as far as I know.

Now the compiler may optimize the standard C library function sin() (provided by libm.so) to be replaced with an call to a native instruction to your CPU/FPU's built-in sin() function, which exists as an FPU instruction (FSIN for x86/x87) on newer processors like the Core 2 series (this is correct pretty much as far back as the i486DX). This would depend on optimization flags passed to the gcc compiler. If the compiler was told to write code that would execute on any i386 or newer processor, it would not make such an optimization. The -mcpu=486 flag would inform the compiler that it was safe to make such an optimization.

Now if the program executed the software version of the sin() function, it would do so based on a CORDIC (COordinate Rotation DIgital Computer) or BKM algorithm, or more likely a table or power-series calculation which is commonly used now to calculate such transcendental functions. [Src: http://en.wikipedia.org/wiki/Cordic#Application]

Any recent (since 2.9x approx.) version of gcc also offers a built-in version of sin, __builtin_sin() that it will used to replace the standard call to the C library version, as an optimization.

I'm sure that is as clear as mud, but hopefully gives you more information than you were expecting, and lots of jumping off points to learn more yourself.

相对绾红妆 2024-08-28 13:00:12

不要使用泰勒级数。正如上面几个人指出的那样,切比雪夫多项式更快、更准确。这是一个实现(最初来自 ZX Spectrum ROM): https://albertveli .wordpress.com/2015/01/10/zx-sine/

Don't use Taylor series. Chebyshev polynomials are both faster and more accurate, as pointed out by a couple of people above. Here is an implementation (originally from the ZX Spectrum ROM): https://albertveli.wordpress.com/2015/01/10/zx-sine/

伴我心暖 2024-08-28 13:00:12

计算正弦/余弦/正切实际上很容易通过使用泰勒级数的代码来完成。自己写一个大约需要 5 秒钟。

整个过程可以用这个等式总结:

sin and cost expansion

以下是我为 C 语言编写的一些例程:

double _pow(double a, double b) {
    double c = 1;
    for (int i=0; i<b; i++)
        c *= a;
    return c;
}

double _fact(double x) {
    double ret = 1;
    for (int i=1; i<=x; i++) 
        ret *= i;
    return ret;
}

double _sin(double x) {
    double y = x;
    double s = -1;
    for (int i=3; i<=100; i+=2) {
        y+=s*(_pow(x,i)/_fact(i));
        s *= -1;
    }  
    return y;
}
double _cos(double x) {
    double y = 1;
    double s = -1;
    for (int i=2; i<=100; i+=2) {
        y+=s*(_pow(x,i)/_fact(i));
        s *= -1;
    }  
    return y;
}
double _tan(double x) {
     return (_sin(x)/_cos(x));  
}

Computing sine/cosine/tangent is actually very easy to do through code using the Taylor series. Writing one yourself takes like 5 seconds.

The whole process can be summed up with this equation here:

sin and cost expansion

Here are some routines I wrote for C:

double _pow(double a, double b) {
    double c = 1;
    for (int i=0; i<b; i++)
        c *= a;
    return c;
}

double _fact(double x) {
    double ret = 1;
    for (int i=1; i<=x; i++) 
        ret *= i;
    return ret;
}

double _sin(double x) {
    double y = x;
    double s = -1;
    for (int i=3; i<=100; i+=2) {
        y+=s*(_pow(x,i)/_fact(i));
        s *= -1;
    }  
    return y;
}
double _cos(double x) {
    double y = 1;
    double s = -1;
    for (int i=2; i<=100; i+=2) {
        y+=s*(_pow(x,i)/_fact(i));
        s *= -1;
    }  
    return y;
}
double _tan(double x) {
     return (_sin(x)/_cos(x));  
}
白龙吟 2024-08-28 13:00:12

Blindy 答案的代码改进版本

#define EPSILON .0000000000001
// this is smallest effective threshold, at least on my OS (WSL ubuntu 18)
// possibly because factorial part turns 0 at some point
// and it happens faster then series element turns 0;
// validation was made against sin() from <math.h>
double ft_sin(double x)
{
    int k = 2;
    double r = x;
    double acc = 1;
    double den = 1;
    double num = x;

//  precision drops rapidly when x is not close to 0
//  so move x to 0 as close as possible
    while (x > PI)
        x -= PI;
    while (x < -PI)
        x += PI;
    if (x > PI / 2)
        return (ft_sin(PI - x));
    if (x < -PI / 2)
        return (ft_sin(-PI - x));
//  not using fabs for performance reasons
    while (acc > EPSILON || acc < -EPSILON)
    {
        num *= -x * x;
        den *= k * (k + 1);
        acc = num / den;
        r += acc;
        k += 2;
    }
    return (r);
}

Improved version of code from Blindy's answer

#define EPSILON .0000000000001
// this is smallest effective threshold, at least on my OS (WSL ubuntu 18)
// possibly because factorial part turns 0 at some point
// and it happens faster then series element turns 0;
// validation was made against sin() from <math.h>
double ft_sin(double x)
{
    int k = 2;
    double r = x;
    double acc = 1;
    double den = 1;
    double num = x;

//  precision drops rapidly when x is not close to 0
//  so move x to 0 as close as possible
    while (x > PI)
        x -= PI;
    while (x < -PI)
        x += PI;
    if (x > PI / 2)
        return (ft_sin(PI - x));
    if (x < -PI / 2)
        return (ft_sin(-PI - x));
//  not using fabs for performance reasons
    while (acc > EPSILON || acc < -EPSILON)
    {
        num *= -x * x;
        den *= k * (k + 1);
        acc = num / den;
        r += acc;
        k += 2;
    }
    return (r);
}
凡尘雨 2024-08-28 13:00:12

它如何做到这一点的本质在于 Gerald Wheatley 的应用数值分析的摘录:

当您的软件程序要求计算机获取以下值时
输入图片此处描述在此处输入图像描述,您是否想知道它如何获得
如果它可以计算的最强大的函数是多项式,那么值?
它不会在表格中查找这些并进行插值!相反,
计算机近似某些函数(除了多项式之外)
专为非常准确地给出值而定制的多项式。

上面需要提到的几点是,某些算法实际上确实从表中进行插值,尽管只是在前几次迭代中。另请注意它如何提到计算机使用逼近多项式,但没有指定逼近多项式的类型。正如线程中的其他人指出的那样,在这种情况下,切比雪夫多项式比泰勒多项式更有效。

The essence of how it does this lies in this excerpt from Applied Numerical Analysis by Gerald Wheatley:

When your software program asks the computer to get a value of
enter image description here or enter image description here, have you wondered how it can get the
values if the most powerful functions it can compute are polynomials?
It doesnt look these up in tables and interpolate! Rather, the
computer approximates every function other than polynomials from some
polynomial that is tailored to give the values very accurately.

A few points to mention on the above is that some algorithms do infact interpolate from a table, albeit only for the first few iterations. Also note how it mentions that computers utilise approximating polynomials without specifying which type of approximating polynomial. As others in the thread have pointed out, Chebyshev polynomials are more efficient than Taylor polynomials in this case.

如梦亦如幻 2024-08-28 13:00:12

如果你想要sin 那么

 __asm__ __volatile__("fsin" : "=t"(vsin) : "0"(xrads));

如果你想要cos 那么

 __asm__ __volatile__("fcos" : "=t"(vcos) : "0"(xrads));

如果你想要sqrt 那么

 __asm__ __volatile__("fsqrt" : "=t"(vsqrt) : "0"(value));

当机器指令就可以的时候为什么要使用不准确的代码呢?

if you want sin then

 __asm__ __volatile__("fsin" : "=t"(vsin) : "0"(xrads));

if you want cos then

 __asm__ __volatile__("fcos" : "=t"(vcos) : "0"(xrads));

if you want sqrt then

 __asm__ __volatile__("fsqrt" : "=t"(vsqrt) : "0"(value));

so why use inaccurate code when the machine instructions will do?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文