近似 e^x
我想近似 ex 函数。
是否可以使用基于多样条类型的方法来做到这一点?即在 x1 和 x2 之间,那么
y1 = a1x + b1,介于 x2 之间> 和 x3,
然后
y2 = a2x + b2
等
这是针对专用 fpga 硬件而不是通用 CPU 。因此我需要自己创建该函数。准确性就不再那么令人担忧了。此外,我真的买不起超过一个乘法电路和/或多个移位/加法器。另外我想要比 CORDIC 函数小得多的东西,事实上大小是至关重要的。
I'd like to approximate the ex function.
Is it possible to do so using multiple splines type based approach? i.e between x1 and x2, then
y1 = a1x + b1, between x2 and x3,
then
y2 = a2x + b2
etc
This is for dedicated fpga hardware and not a general purpose CPU. As such I need to create the function myself. Accuracy is much less of a concern. Furthermore I can't really afford more than one multiplication circuit and/or multiple shifts/adders. Also I want something much smaller than a CORDIC function, in fact size is critical.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
像这样使用公式的策略怎么样
1/ln(2)
我意识到这不是一个完整的解决方案,但它只需要单身的乘法并将剩下的问题简化为近似 2 的分数次方,这应该更容易在硬件中实现。
此外,如果您的应用程序足够专业,您可以尝试重新导出将在您的硬件上运行的所有数字代码,使其位于基数系统中,并实现您的浮点硬件也可以在基地e工作。那么根本不需要转换。
How about a strategy like this that uses the formula
1/ln(2)
I realize this is not a complete solution, but it does only require a single multiplication and reduces the remaining problem to approximating a fractional power of 2, which should be easier to implement in hardware.
Also, if your application is specialized enough, you could try to re-derive all of the numerical code that will run on your hardware to be in a base-e number system and implement your floating point hardware to work in base e as well. Then no conversion is needed at all.
如果
x
是一个整数,你可以一遍又一遍地将e
乘以它本身。如果
x
不是整数,则可以使用上述方法计算efloor(x),然后乘以一个小的修正项。该校正项可以使用多种近似方法轻松计算。其中一种方法是这样的:这来自于ex的(优化)幂级数展开,对于
x。如果您需要更高的准确性,只需在该系列中添加更多术语即可。
这个 math.stackexchange 问题包含一些额外的聪明答案。
编辑:请注意,有一种更快的计算 en 的方法,称为 通过平方求幂。
If
x
is an integer, you can just multiplye
by itself over and over again.If
x
is not an integer, you can calculate the efloor(x) using the above method and then multiply by a small correction term. This correction term can be easily calculated using a number of approximation methods. One such way is this:This comes from the (optimized) power series expansion of ex, which is very accurate for small values of
x
. If you need more accuracy, just tack on more terms to the series.This math.stackexchange question contains some additional clever answers.
EDIT: Note that there is a faster way of calculating en called exponentiation by squaring.
首先,是什么推动了这种近似?换句话说,简单的
exp(x)
到底有什么问题?也就是说,
exp(x)
的典型实现是k
和浮点数r
,使得x= k*log(2) + r
和r
介于 -0.5*log(2) 和 0.5*log(2) 之间。exp(x)
为 2k*exp(r)
。exp(x)
的标准实现使用 Remes 类型算法来得出近似于exp(r)
的极小极大多项式。关键在于:无论您做什么,您的函数都比仅调用
exp()
慢得多。exp()
的大部分功能都是在计算机的数学协处理器中实现的。在软件中重新实现该功能,即使精度降低,也会比仅使用exp()
慢一个数量级。First off, what is motivating this approximation? In other words, what exactly is wrong with the straightforward
exp(x)
?That said, a typical implementation of
exp(x)
is tok
and floating point numberr
such thatx=k*log(2) + r
andr
is between -0.5*log(2) and 0.5*log(2).exp(x)
is 2k*exp(r)
.exp(x)
use a Remes-type algorithm to come up with a minimax polynomial that approximatesexp(r)
.Here's the kicker: No matter what you do the odds are very high that your function will be much, much slower than just calling
exp()
. Most of the functionality ofexp()
is implemented in your computer's math coprocessor. Re-implementing that functionality in software, even with reduced precision, is going to be an order of magnitude slower than just usingexp()
.对于硬件,如果您需要位级精确度,我可以为您提供一个很棒的解决方案。 (否则只需像上面那样进行近似)。恒等式为 exp(x) = cosh(x) + sinh(x),即双曲正弦和余弦。问题是双曲正弦和余弦可以使用 CORIC 技术计算,最重要的是,它们是 FAST CORDIC 函数之一,这意味着它们看起来几乎像乘法而不是除法!
这意味着对于数组乘法器的面积,您只需 2 个周期即可计算任意精度的指数!
查找 CORDIC 方法 - 它对于硬件实现来说非常神奇。
另一种硬件方法是使用一个小表和其他人提到的公式:exp(x + y) = exp(x) * exp(y)。您可以将数字分解为小的位字段(例如一次 4 或 8 位),然后只需查找该位字段的指数即可。可能只对狭窄的计算有效,但这是另一种方法。
For hardware, I have an awesome solution for you IF you need it to be bit-level accurate. (Else just do an approximation like above). The identity is exp(x) = cosh(x) + sinh(x), the hyperbolic sine and cosine. The catch is that the hyperbolic sine and cosine can be computed using the CORIC technique, and best of all, they are one of the FAST CORDIC functions, meaning they look almost like multiply instead of almost like divide!
Which means for about the area of an array multiplier, you can compute exponent to arbitrary precision in just 2 cycles!
Look up the CORDIC method - it's AMAZING for hardware implementation.
One other hardware approach is using a small table in conjunction with a formula others have mentioned: exp(x + y) = exp(x) * exp(y). You can break the number up into small bit fields - say 4 or 8 bits at a time - and just look up the exponent for that bitfield. Probably only effective for narrow computations, but it's another approach.
http://martin.ankerl.com/2007 /02/11/optimized-exponential-functions-for-java/
使用 Schraudolph 的方法 (http://nic.schraudolph.org/pubs/Schraudolph99.pdf)
在Java中:
和
https://math.stackexchange.com/a/56064(寻找 Pade 近似)。
http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/
using Schraudolph's method (http://nic.schraudolph.org/pubs/Schraudolph99.pdf)
in Java:
and
https://math.stackexchange.com/a/56064 (look for Pade approximant).
这不是您请求的平滑样条插值,但其计算效率很高:
绘图输出
This is not the smooth spline interpolation you requested but its computationally efficient:
Plot Output
当然是“可能”的。有几个问题。
您对准确度有什么要求?
您愿意使用高阶样条线吗?
您愿意为此花费多少内存?足够小的间隔内的线性函数将近似指数函数到所需的任何精度,但它可能需要非常小的间隔。
编辑:
鉴于提供的附加信息,我进行了快速测试。范围缩减始终可用于指数函数。因此,如果我想计算任意 x 的 exp(x),那么我可以用以下形式重写问题...
其中 xi 是 x 的整数部分,xf 是小数部分。整数部分很简单。以二进制形式计算 xi,然后重复平方和乘法允许您以相对较少的运算计算 exp(xi)。 (其他技巧,使用 2 的幂和其他间隔可以为您提供更快的速度,以满足您对速度的渴望。)
现在剩下的就是计算 exp(xf)。我们是否可以使用具有线性段的样条曲线来计算 exp(xf),在区间 [0,1] 上仅使用 4 个线性段,精度为 0.005?
最后一个问题是由我几年前编写的一个函数解决的,该函数将使用给定阶数的样条函数逼近一个函数,使其在最大误差的固定容差范围内。该代码需要区间 [0,1] 内的 8 个段才能通过分段线性样条函数实现所需的公差。如果我选择将间隔进一步减小到 [0,0.5],我现在可以达到规定的容差。
所以答案很简单。如果您愿意缩小范围以将 x 减小到区间 [0.0.5],然后进行适当的计算,那么您可以使用 4 段的线性样条曲线来达到所要求的精度。
最后,使用硬编码的指数函数总是会更好。如果 exp(x) 可用,上面提到的所有操作肯定会比编译器提供的慢。
Of course it is "possible". There are several issues.
What is your requirement for the accuracy?
Are you willing to use higher order splines?
How much memory are you willing to spend on this? Linear function over small enough intervals will approximate the exponential function to any degree of accuracy needed, but it may require a VERY small interval.
Edit:
Given the additional information provided, I ran a quick test. Range reduction can always be used on the exponential function. Thus, if I wish to compute exp(x) for ANY x, then I can rewrite the problem in the form...
where xi is the integer part of x, and xf is the fractional part. The integer part is simple. Compute xi in binary form, then repeated squarings and multiplications allow you to compute exp(xi) in relatively few operations. (Other tricks, using powers of 2 and other intervals can give you yet more speed for the speed hungry.)
All that remains is now to compute exp(xf). Can we use a spline with linear segments to compute exp(xf), over the interval [0,1] with only 4 linear segments, to an accuracy of 0.005?
This last question is resolved by a function that I wrote a few years ago, that will approximate a function with a spline of a given order, to within a fixed tolerance on the maximum error. This code required 8 segments over the interval [0,1] to achieve the required tolerance with a piecewise linear spline function. If I chose to reduce the interval further to [0,0.5], I could now achieve the prescribed tolerance.
So the answer is simple. If you are willing to do the range reductions to reduce x to the interval [0.0.5], then do the appropriate computations, then yes you can achieve the requested accuracy with a linear spline in 4 segments.
In the end, you will always be better off using a hard coded exponential function though. All of the operations mentioned above will surely be slower than what your compiler will provide, IF exp(x) is available.
这不适合定制 FPGA,但值得一提。
http://www.machinedlearnings.com/2011/06/fast -approximate-logarithm-exponential.html
源代码:
https://code.google.com/archive/p/fastapprox/downloads
“ Faster”的实现仅涉及 3 个步骤(乘法、加法、将 float 转换为 int)以及最终转换回 float。根据我的经验,它的准确度为 2%,如果您不关心实际值但在对数似然最大化迭代中使用该值,这可能就足够了。
This is not appropriate for custom FPGA, but worth mentioning.
http://www.machinedlearnings.com/2011/06/fast-approximate-logarithm-exponential.html
And the source code:
https://code.google.com/archive/p/fastapprox/downloads
The "faster" implementation only involves 3 steps (multiply, add, convert float to int) and a final cast back to float. In my experience, it is 2% accurate, which may be enough if you don't care about the actual value but are using the value in a log-likelihood maximization iteration.
Wolfram 提出了一些在级数等方面近似它的好方法:
维基百科页面 泰勒级数还展示了 ex 围绕 0 展开的示例:
Wolfram presents a few good ways of approximating it in terms of series etc:
Wikipedias page on Taylor Series also shows an example of an expansion of ex around 0:
或者您可以在 C 中执行
pow(M_E, x)
。(某些平台没有定义M_E
;在这些平台上,您可能需要手动指定e,大约是2.71828182845904523536028747135266249775724709369995
。)(正如 David 在评论中指出的那样,
exp(x)
比pow(M_E, x)
更高效。再次强调,大脑还没有打开。)你有一个用例,其中计算ex 是一个经过验证的瓶颈吗?如果没有,您应该首先编写可读性的代码;仅当明显的方法太慢时才尝试此类优化。
Or you could just do
pow(M_E, x)
in C. (Some platforms don't haveM_E
defined; on those, you may have to manually specify the value of e, which is approximately2.71828182845904523536028747135266249775724709369995
.)(As David points out in the comments,
exp(x)
would be more efficient thanpow(M_E, x)
. Again, brain not turned on yet.)Do you have a use case where the calculation of ex is a proven bottleneck? If not, you should be coding for readability first; only try these sorts of optimizations if the obvious approach is too slow.