为多种架构生成优化的 NDK 代码?

发布于 2024-10-18 17:56:03 字数 661 浏览 1 评论 0原文

我有一些适用于 Android 的 C 代码,可以进行大量低级数字运算。我想知道我应该使用什么设置(例如,对于我的 Android.mk 和 Application.mk)文件,以便生成的代码可以在所有当前的 Android 设备上运行,但也可以利用针对特定芯片组的优化。我正在寻找良好的默认 Android.mk 和 Application.mk 设置来使用,并且我希望避免用 #ifdef 分支乱七八糟地乱扔我的 C 代码。

例如,我知道 ARMv7 有浮点指令,一些 ARMv7 芯片支持 NEON 指令,而默认的 ARM 不支持这些指令。是否可以设置标志,以便我可以使用 NEON 构建 ARMv7、不使用 NEON 的 ARMv7 以及默认的 ARM 构建?我知道如何执行后两项,但不是全部 3 项。我对使用的设置持谨慎态度,因为我认为当前的默认设置是最安全的设置,并且其他选项有哪些风险。

对于 GCC 特定的优化,我使用以下标志:

LOCAL_CFLAGS=-ffast-math -O3 -funroll-loops

我已经检查了所有这 3 个标志来加速我的代码。还有其他常见的我可以添加吗?

我的另一个技巧是将“LOCAL_ARM_MODE := arm”添加到 Android.mk 中,以加快较新的 Arm 芯片的速度(尽管我对它的作用以及旧芯片上发生的情况感到困惑)。

I have some C code for Android that does lots of low-level number crunching. I'd like to know what settings I should use (e.g. for my Android.mk and Application.mk) files so that the code produced will run on all current Android devices but also takes advantage of optimisations for specific chipsets. I'm looking for good default Android.mk and Application.mk settings to use and I want to avoid having to litter my C code with #ifdef branches.

For example, I'm aware that ARMv7 has floating point instructions and some ARMv7 chips support NEON instructions and that the default ARM supports neither of these. Is it possible to set flags so that I can build ARMv7 with NEON, ARMv7 without NEON and the default ARM build? I'm know how to do the latter two but not all 3. I'm cautious about what settings I use as I assume the current defaults are the safest settings and what risks other options have.

For GCC specific optimisation, I'm using the following flags:

LOCAL_CFLAGS=-ffast-math -O3 -funroll-loops

I've checked all 3 of these speed up my code. Are there any other common ones I could add?

Another tip I have is to add "LOCAL_ARM_MODE := arm" to Android.mk to enable a speed up on newer arm chips (although I'm confused at exactly what this does and what happens on older chips).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

瀟灑尐姊 2024-10-25 17:56:03

ARM 处理器支持 2 个通用指令集:“ARM”和“Thumb”。尽管两者有不同的风格,但 ARM 指令均为 32 位,而 Thumb 指令为 16 位。两者之间的主要区别在于 ARM 指令可以在单个指令中执行比 Thumb 更多的操作。例如,一条 ARM 指令可以将一个寄存器添加到另一个寄存器,同时对第二个寄存器执行左移。在 Thumb 中,一条指令必须执行移位操作,然后第二条指令执行加法操作。

ARM 指令的性能不是两倍,但在某些情况下它们可以更快。在手动 ARM 汇编中尤其如此,可以通过新颖的方式对其进行调整,以充分利用“免费移位”。拇指指令有其自身的优势和尺寸:它们消耗的电池更少。

无论如何,这就是 LOCAL_ARM_MODE 的作用 - 这意味着您将代码编译为 ARM 指令而不是 Thumb 指令。编译为 Thumb 是 NDK 中的默认设置,因为它往往会创建较小的二进制文件,并且对于大多数代码来说速度差异并不明显。编译器并不总是能够利用 ARM 提供的额外“魅力”,因此最终您最终需要或多或少相同数量的指令。

从编译到 ARM 或 Thumb 的 C/C++ 代码中看到的结果将是相同的(除非编译器错误)。

这本身就可以兼容当今所有 Android 手机的新旧 ARM 处理器。这是因为默认情况下,NDK 会编译为支持 ARMv5TE 指令集的基于 ARM 的 CPU 的“应用程序二进制接口”。此 ABI 称为“armeabi”,可以通过输入 APP_ABI :=armeabi 在 Application.mk 中显式设置。

较新的处理器还支持 Android 特定的 ABI,称为 armeabi-v7a,它扩展了 armeabi 以添加 Thumb-2 指令集 和称为 VFPv3-D16 的硬件浮点指令集。 armeabi-v7a 兼容的 CPU 还可以选择支持 NEON 指令集,您必须在运行时检查该指令集,并提供其何时可用和何时不可用的代码路径。 NDK/samples 目录中有一个执行此操作的示例 (hello-neon)。从本质上讲,Thumb-2 更像“ARM”,因为它的指令可以在一条指令中执行更多操作,同时具有占用更少空间的优势。

为了编译包含armeabi和armeabi-v7a库的“胖二进制文件”,您需要将以下内容添加到Application.mk:

APP_ABI := armeabi armeabi-v7a

安装.apk文件时,Android包管理器会安装最适合设备的库。因此,在较旧的平台上,它将安装armeabi库,在较新的设备上,它将安装armeabi-v7a库。

如果您想在运行时测试 CPU 功能,则可以使用 NDK 函数 uint64_t android_getCpuFeatures() 来获取处理器支持的功能。在 v7a 处理器上,这将返回 ANDROID_CPU_ARM_FEATURE_ARMv7 位标志;如果支持硬件浮点,则返回 ANDROID_CPU_ARM_FEATURE_VFPv3;如果支持高级 SIMD 指令,则返回 ANDROID_CPU_ARM_FEATURE_NEON。如果没有 VFPv3,ARM 就无法拥有 NEON。

总之:默认情况下,您的程序是最兼容的。由于使用了 ARM 指令,使用 LOCAL_ARM_MODE 可能会稍微加快速度,但会牺牲电池寿命 - 并且它与默认设置一样兼容。通过添加 APP_ABI :=armeabiarmeabi-v7a 行,您将在较新设备上提高性能,并保持与旧设备的兼容性,但您的 .apk 文件会更大(由于有 2 个库)。为了使用 NEON 指令,您需要编写特殊代码来在运行时检测 CPU 的功能,这仅适用于可以运行armeabi-v7a 的较新设备。

ARM processors have 2 general instruction sets that they support: "ARM" and "Thumb". Though there are different flavors of both, ARM instructions are 32 bits each and Thumb instructions are 16 bits. The main difference between the two is that ARM instructions have the possibility to do more in a single instruction than Thumb can. For example a single ARM instruction can add one register to another register, while performing a left shift on the second register. In Thumb one instruction would have to do the shift, then a second instruction would do the addition.

ARM instructions are not twice as good, but in certain cases they can be faster. This is especially true in hand-rolled ARM assembly, which can be tuned in novel ways to make the best use of "shifts for free". Thumb instructions have their own advantage as well as size: they drain the battery less.

Anyway, this is what LOCAL_ARM_MODE does - it means you compile your code as ARM instructions instead of Thumb instructions. Compiling to Thumb is the default in the NDK as it tends to create a smaller binary and the speed difference is not that noticeable for most code. The compiler can't always take advantage of the extra "oomph" that ARM can provide, so you end up needing more or less the same number of instructions anyway.

The result of what you see from C/C++ code compiled to ARM or Thumb will be identical (barring compiler bugs).

This by itself is compatible between new and old ARM processors for all Android phones available today. This is because by default the NDK compiles to an "Application Binary Interface" for ARM-based CPUs that support the ARMv5TE instruction set. This ABI is known as "armeabi" and can be explicitly set in the Application.mk by putting APP_ABI := armeabi.

Newer processors also support the Android-specific ABI known as armeabi-v7a, which extends armeabi to add the Thumb-2 instruction set and a hardware floating point instruction set called VFPv3-D16. armeabi-v7a compatible CPUs can also optionally support the NEON instruction set, which you have to check for at run time and provide code paths for when it is available and when it is not. There's an example in the NDK/samples directory that does this (hello-neon). Under the hood, Thumb-2 is more "ARM-like" in that its instructions can do more in a single instruction, while having the advantage of still taking up less space.

In order to compile a "fat binary" that contains both armeabi and armeabi-v7a libraries you would add the following to Application.mk:

APP_ABI := armeabi armeabi-v7a

When the .apk file is installed, the Android package manager installs the best library for the device. So on older platforms it would install the armeabi library, and on newer devices the armeabi-v7a one.

If you want to test for CPU features at run time then you can use the NDK function uint64_t android_getCpuFeatures() to get the features supported by the processor. This returns a bit-flag of ANDROID_CPU_ARM_FEATURE_ARMv7 on v7a processors, ANDROID_CPU_ARM_FEATURE_VFPv3 if hardware floating points are supported and ANDROID_CPU_ARM_FEATURE_NEON if advanced SIMD instructions are supported. ARM can't have NEON without VFPv3.

In summary: by default, your programs are the most compatible. Using LOCAL_ARM_MODE may make things slightly faster at the expense of battery life due to the use of ARM instructions - and it is as compatible as the default set-up. By adding the APP_ABI := armeabi armeabi-v7a line you will have improved performance on newer devices, remain compatible with older ones, but your .apk file will be larger (due to having 2 libraries). In order to use NEON instructions, you will need to write special code that detects the capabilities of the CPU at run time, and this only applies to newer devices that can run armeabi-v7a.

以往的大感动 2024-10-25 17:56:03

很好的答案,就像补充一下,您应该使用

APP_ABI := all

它来编译 4 个二进制文件、armv5、armv7、x86 和 mips,

您可能需要新版本的 ndk

Great answer, just like to add you should use

APP_ABI := all

this will compile 4 binaries, armv5, armv7, x86 and mips

you may need a new version of ndk

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文