SSE2:双精度对数函数

发布于 2024-10-08 01:35:25 字数 229 浏览 9 评论 0原文

我需要日志函数的开源(无许可证限制)实现,带有签名的东西

__m128d _mm_log_pd(__m128d);

可以在英特尔短向量数学库(ICC的一部分)中找到,但ICC既不是免费的也不是开源的。我正在寻找仅使用内在函数的实现。

它应该使用特殊的有理函数近似。我需要一些几乎与 cmath log 一样准确的东西,比如 9-10 位十进制数字,但速度更快。

I need open source (no restriction on license) implementation of log function, something with signature

__m128d _mm_log_pd(__m128d);

It is available in Intel Short Vector Math Library (part of ICC), but ICC is neither free nor open source. I am looking for implementation using intrinsics only.

It should use special rational function approximations. I need something almost as accurate as cmath log, say 9-10 decimal digits, but faster.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

暮色兮凉城 2024-10-15 01:35:25

我相信 log2 更容易计算。您可以将数字乘以/除以 2 的幂(非常快),使其位于 (0.5, 2],然后使用 Pade近似(取M接近N),很容易一劳永逸地推导,并且可以根据需要选择其顺序。只需进行算术运算您可以使用 SSE 内在函数来执行此操作,不要忘记根据上述比例因子添加/删除一个常数,

如果您想要自然对数,请除以 log2(e),您可以计算一次。对于所有人来说,

在某些特定项目中看到自定义日志函数并不罕见。标准库函数可以解决一般情况,但我真诚地认为自己做起来并不难。

I believe log2 is easier to compute. You can multiply/divide your number by a power of two (very quick) such that it lies in (0.5, 2], and then you use a Pade approximant (take M close to N) which is easy to derive once and for all, and whose order you can chose according to your needs. You only need arithmetic operations that you can do with SSE intrinsics. Don't forget to add/remove a constant according to the above scaling factor.

If you want natural log, divide by log2(e), that you can compute once and for all.

It is not rare to see custom log functions in some specific projects. Standard library functions address the general case, but you need something more specific. I sincerely think it is not that hard to do it yourself.

捎一片雪花 2024-10-15 01:35:25

查看 AMD LibM。它不是开源的,但是免费的。 AFAIK,它适用于 Intel CPU。在同一网页上,您可以找到 ACML 的链接,这是 AMD 的另一个免费数学库。它拥有 AMD LibM + Matrix 算法、FF 和发行版的所有内容。

我不知道双精度向量化数学函数的任何开源实现。我猜想 Intel 和 AMD 库是由 CPU 制造商手工优化的,当速度很重要时每个人都会使用它们。 IIRC,曾尝试在 GCC 中实现向量化数学函数的内在函数。我不知道他们能走多远。显然,这不是一项微不足道的任务。

Take a look at AMD LibM. It isn't open source, but free. AFAIK, it works on Intel CPUs. On the same web page you find a link to ACML, another free math lib from AMD. It has everything from AMD LibM + Matrix algos, FF and distributions.

I don't know any open source implementation of double precision vectorized math functions. I guess Intel and AMD libs are hand optimised by the CPU manufacturer and everyone uses them when speed is important. IIRC, there was an attempt to implement intrinsics for vectorized math functions in GCC. I don't how far they managed to get. Obviously, it isn't a trivial task.

慕烟庭风 2024-10-15 01:35:25

Framewave 项目 已获得 Apache 2.0 许可,旨在成为与英特尔 IPP 相当的开源项目。它的实现接近您正在寻找的内容。
检查文档中的固定精度算术函数。

Framewave project is Apache 2.0 licensed and aims to be the open source equivalent of Intel IPP. It has implementations that are close to what you are looking for.
Check the fixed accuracy arithmetic functions in the documentation.

决绝 2024-10-15 01:35:25

这是 __m256d 的对应项:https://stackoverflow.com/a/45898937/1915854 。将其剪切为 __m128d 应该非常简单。如果您遇到任何问题,请告诉我。

或者您可以将我的实现视为一次获取​​两个 __m128d 数字。

Here's the counterpart for __m256d: https://stackoverflow.com/a/45898937/1915854 . It should be pretty trivial to cut it to __m128d. Let me know if you encounter any problems with this.

Or you can view my implementation as something obtaining two __m128d numbers at once.

流星番茄 2024-10-15 01:35:25

如果您找不到现有的开源实现,那么使用泰勒级数的标准方法创建自己的实现相对容易。有关此方法和各种其他方法,请参阅维基百科

If you cannot find an existing open source implementation it is relatively easy to create your own using the standard method of a Taylor series. See Wikipedia for this and a variety of other methods.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文