所有Linux内核功能是否由0x10对齐?为什么?

发布于 2025-02-10 06:13:56 字数 315 浏览 0 评论 0原文

我试图解决一个问题:“ kallsyms_lookup_name不再导出在内核中> 5.7 “,并在以下位置找到了一个解决方案: https://github.com/xcellerator/xcellerator/linux_kernel_kernel_hacking/linux_kernel_hacking/siss/

3

I tried to solve the problem that "kallsyms_lookup_name is not exported anymore in kernels > 5.7
", and found a solution at: https://github.com/xcellerator/linux_kernel_hacking/issues/3.

It says that "the kernel functions are all aligned so that the final nibble is 0x0", and I wonder why?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

农村范ル 2025-02-17 06:13:56

通常:不,内核函数并非全部,并且总是与16个字节保持一致。说“内核函数都是对齐的,因此最终的nibble为0x0” 是错误的。但是,在最常见的情况下,即用默认内核编译器标志编译的Linux X86-64,这恰好是正确的。以其他情况为例,例如ARM64的默认配置,您会发现这不存在。

内核本身并未指定函数的任何对齐,而是它启用的编译器优化可以(并且将)对齐函数。

函数的对齐实际上是一种编译器优化,使用-falign-functions =启用了GCC上的编译器。 根据GCC doc 至少用- - - o2cc_optimize_for_performance = y,默认值为)将启用此优化,而无需设置显式值。这意味着实际对齐值由GCC基于体系结构选择。在x86上,默认是16个字节对于“通用”计算机类型(-march = x86-64,请参阅 doc )。

Clang还支持-falign-functions =,因为版本6.0.1根据其存储库(以前被忽略),尽管我不确定它是否在不同的优化级别启用。


为什么这是一种优化?好吧,一致性可以提供性能优势。从理论上讲,缓存线对齐方式对于缓存性能来说是“最佳” 。请参阅函数对数量的功能对现代处理器实际上实际上很重要?

In general: no, kernel functions aren't all and always aligned to 16 bytes. Saying "the kernel functions are all aligned so that the final nibble is 0x0" is wrong. However, in the most common case, which is Linux x86-64 compiled with GCC with default kernel compiler flags, this happens to be true. Take some other case, like for example default config for ARM64, and you'll see that this does not hold.

The kernel itself does not specify any alignment for functions, but the compiler optimizations that it enables can (and will) align functions.

Alignment of functions is in fact a compiler optimization that on GCC is enabled using -falign-functions=. According to the GCC doc, compiling with at least -O2 (selected by CC_OPTIMIZE_FOR_PERFORMANCE=y, which is the default) will enable this optimization without an explicit value set. This means that the actual alignment value is chosen by GCC based on the architecture. On x86, the default is 16 bytes for the "generic" machine type (-march=x86-64, see doc).

Clang also supports -falign-functions= since version 6.0.1 according to their repository (it was previously ignored), though I am not sure if it is enabled or not at different optimization levels.


Why is this an optimization? Well, alignment can offer performance advantages. In theory, cache line alignment would be "optimal" for cache performance, but there are other factors to consider: aligning to 64 bytes (cache line size on x86) would probably waste a lot of space for no good reason without improving performance that much. See How much does function alignment actually matter on modern processors?

锦欢 2025-02-17 06:13:56

最重要的是,这是不正确的通常。一些架构将具有其他对齐标准。 为什么通常是一个很难回答的问题。 Linux完全有可能不这样做。它甚至可以随着同一架构的内核版本而变化。


低鼻子为零是16 Byte对准。它由链接器和编译器(通过CPU和/或ABI限制/效率)执行。通常,您希望函数在缓存线上启动。因此,在缓存中,函数不会彼此重叠。它也更容易填充。即,期望您在功能末尾只有一个部分。分支,案例标签和其他构造可能还可以根据CPU对齐。

即使没有缓存,主SDRAM也会填充批处理。 16比较似乎是一个合理的数量,可以最大程度地减少对齐开销(浪费字节)与效率。 SDRAM周期较少。当然,SDRAM爆发和缓存线的顺序与它们一起工作以将代码获取到CPU解码单元。

还可能有其他路线的原因,例如硬件和内部表,仅使用一个子集地址位。此对齐仅适用于外部功能。在对齐数据上,某些说明将更快地操作(或仅能)。因此,某些内核功能仪器也可能受益于对齐(通过更紧凑的表或添加贴面等)。

请参阅: x86_64堆栈对齐 - 堆栈对齐的许多理由可以应用于代码,因为内核可以将代码视为数据。

Most importantly, this is not true generically. Some architectures will have other alignment criteria. Why is often a really difficult question to answer. It is completely possible for Linux to not do this. It could even change with kernel versions for the same architechure.


The low nibble as zero is 16byte alignment. It is enforced by the linker and the compiler (via CPU and/or ABI restrictions/efficiencies). Generally, you want a function to start on a cache line. This is so functions do not overlap with each other in the cache. It is also easier to fill. Ie, the expectation is that you will only have one partial at the end of the function. It is possible that branches, case labels and other constructs could also be aligned depending on the CPU.

Even without cache, primary SDRAM fills in batches. 16bytes seems like a reasonable amount to minimize the overhead of alignment (wasted bytes) versus efficiency. Less SDRAM cycles. Of course the SDRAM burst and cache lines are of the same order as they work together to get code to the CPUs decode units.

There can be other reasons for alignment such as hardware and internal tables that only use a sub-set of address bits. This alignment can be for external functions only. Some instructions will operate faster (or only be possible) on aligned data. So, some kernel function instrumentation may also benefit from the alignment (either through more compact tables or adding veneers, etc).

See: x86_64 stack alignment - where many of the rationales for the stack alignment can apply to code as the kernel can on occasion treat code as data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文