什么是应用程序二进制接口 (ABI)?

发布于 2024-08-19 17:52:41 字数 2354 浏览 13 评论 0原文

我从来没有清楚地理解什么是 ABI。请不要向我指出维基百科文章。如果我能理解的话,我就不会在这里发这么长的帖子了。

这是我对不同接口的看法:

电视遥控器是用户和电视之间的接口。它是一个现有的实体,但本身没有用(不提供任何功能)。遥控器上每个按钮的所有功能都在电视机中实现。

接口:它是介于 功能以及该功能的使用者。本身有一个接口 没有做任何事情。它只是调用背后的功能。

现在,根据用户是谁,有不同类型的界面。

命令行界面 (CLI) 命令是现有实体, 消费者就是用户,而功能则位于后面。

功能: 我的软件功能解决了一些问题 我们描述此接口的目的。

现有实体:命令

消费者:用户

图形用户界面(GUI)窗口、按钮等都是现有的 实体,消费者就是用户,功能位于后面。

功能:我的软件功能解决了我们描述此界面的一些问题。

现有实体:窗口、按钮等。

消费者:用户

应用程序编程接口(API)函数(或 更正确的)接口(在基于接口的编程中)是 现有实体,这里的消费者是另一个程序而不是用户,再次 功能位于这一层的后面。

功能:我的软件功能解决了一些问题 我们正在描述这个接口的问题。

现有实体:函数、接口(函数数组)。

消费者:另一个程序/应用程序。

应用程序二进制接口 (ABI) 这就是我的问题开始的地方。

功能: ???

现有实体: ???

消费者: ???

  • 我用不同的语言编写了软件,并提供了不同类型的界面(CLI、GUI 和 API),但我不确定我是否提供过任何 ABI。

维基百科说:

ABI 涵盖以下详细信息

  • 数据类型、大小和对齐方式;
  • 调用约定,它控制函数参数的方式 传递并检索返回值;
  • 系统调用号以及应用程序应如何进行系统调用 到操作系统;

其他 ABI 标准化了细节,例如

  • C++ 名称修改,
  • 异常传播,以及
  • 同一平台上编译器之间的调用约定,但是 不需要跨平台兼容性。
  • 谁需要这些详细信息?请不要说操作系统。我了解汇编编程。我知道如何链接&装载工程。我确切地知道里面发生了什么。

  • 为什么要引入 C++ 名称修饰?我以为我们是在二进制层面上讨论的。为什么会出现语言?

无论如何,我已经下载了 [PDF] System V 应用程序二进制接口版本 4.1 ( 1997-03-18) 看看它到底包含什么。嗯,大部分都没有任何意义。

  • 为什么它包含两章(第 4 章和第 5 章)来描述 ELF 文件格式?事实上,这是该规范中仅有的两个重要章节。其余章节是“特定于处理器的”。无论如何,我认为这是一个完全不同的话题。请不要说 ELF 文件格式规范就是 ABI。根据定义,它不符合接口的资格。

  • 我知道,既然我们是在如此低的层面上讨论,那么它一定是非常具体的。但我不确定它是如何特定于“指令集架构(ISA)”的?

  • 在哪里可以找到 Microsoft Windows 的 ABI?

所以,这些是困扰我的主要问题。

I never clearly understood what an ABI is. Please don't point me to a Wikipedia article. If I could understand it, I wouldn't be here posting such a lengthy post.

This is my mindset about different interfaces:

A TV remote is an interface between the user and the TV. It is an existing entity, but useless (doesn't provide any functionality) by itself. All the functionality for each of those buttons on the remote is implemented in the television set.

Interface: It is an "existing entity" layer between the
functionality and consumer of that functionality. An interface by itself
doesn't do anything. It just invokes the functionality lying behind.

Now depending on who the user is there are different type of interfaces.

Command Line Interface (CLI) commands are the existing entities,
the consumer is the user and functionality lies behind.

functionality: my software functionality which solves some
purpose to which we are describing this interface.

existing entities: commands

consumer: user

Graphical User Interface(GUI) window, buttons, etc. are the existing
entities, and again the consumer is the user and functionality lies behind.

functionality: my software functionality which solves some problem to which we are describing this interface.

existing entities: window, buttons etc..

consumer: user

Application Programming Interface(API) functions (or to be
more correct) interfaces (in interfaced based programming) are the
existing entities, consumer here is another program not a user, and again
functionality lies behind this layer.

functionality: my software functionality which solves some
problem to which we are describing this interface.

existing entities: functions, Interfaces (array of functions).

consumer: another program/application.

Application Binary Interface (ABI) Here is where my problem starts.

functionality: ???

existing entities: ???

consumer: ???

  • I've written software in different languages and provided different kinds of interfaces (CLI, GUI, and API), but I'm not sure if I have ever provided any ABI.

Wikipedia says:

ABIs cover details such as

  • data type, size, and alignment;
  • the calling convention, which controls how functions' arguments are
    passed and return values retrieved;
  • the system call numbers and how an application should make system calls
    to the operating system;

Other ABIs standardize details such as

  • the C++ name mangling,
  • exception propagation, and
  • calling convention between compilers on the same platform, but do
    not require cross-platform compatibility.
  • Who needs these details? Please don't say the OS. I know assembly programming. I know how linking & loading works. I know exactly what happens inside.

  • Why did C++ name mangling come in? I thought we are talking at the binary level. Why do languages come in?

Anyway, I've downloaded the [PDF] System V Application Binary Interface Edition 4.1 (1997-03-18) to see what exactly it contains. Well, most of it didn't make any sense.

  • Why does it contain two chapters (4th & 5th) to describe the ELF file format? In fact, these are the only two significant chapters of that specification. The rest of the chapters are "processor specific". Anyway, I though that it is a completely different topic. Please don't say that ELF file format specifications are the ABI. It doesn't qualify to be an interface according to the definition.

  • I know, since we are talking at such a low level it must be very specific. But I'm not sure how is it "instruction set architecture (ISA)" specific?

  • Where can I find Microsoft Windows' ABI?

So, these are the major queries that are bugging me.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(17

鹿! 2024-08-26 17:52:42

Linux共享库最小可运行ABI示例

在共享库的上下文中,“拥有稳定的ABI”最重要的含义是在库更改后不需要重新编译程序。

例如:

  • 如果您销售共享库,您可以为用户省去为每个新版本重新编译依赖于您的库的所有内容的烦恼

  • 如果您正在销售依赖于用户发行版中存在的共享库的闭源程序,如果您确定 ABI 稳定,您可以发布和测试更少的预构建程序跨目标操作系统的某些版本。

    这对于 C 标准库来说尤其重要,系统中的许多程序都链接到该库。

现在我想提供一个最小的具体可运行示例。

main.c

#include <assert.h>
#include <stdlib.h>

#include "mylib.h"

int main(void) {
    mylib_mystruct *myobject = mylib_init(1);
    assert(myobject->old_field == 1);
    free(myobject);
    return EXIT_SUCCESS;
}

mylib.c

#include <stdlib.h>

#include "mylib.h"

mylib_mystruct* mylib_init(int old_field) {
    mylib_mystruct *myobject;
    myobject = malloc(sizeof(mylib_mystruct));
    myobject->old_field = old_field;
    return myobject;
}

mylib.h

#ifndef MYLIB_H
#define MYLIB_H

typedef struct {
    int old_field;
} mylib_mystruct;

mylib_mystruct* mylib_init(int old_field);

#endif

编译并运行良好:

cc='gcc -pedantic-errors -std=c89 -Wall -Wextra'
$cc -fPIC -c -o mylib.o mylib.c
$cc -L . -shared -o libmylib.so mylib.o
$cc -L . -o main.out main.c -lmylib
LD_LIBRARY_PATH=. ./main.out

现在,假设对于库的 v2,我们要向 mylib_mystruct 添加一个名为 new_field 的新字段。

如果我们在 old_field 之前添加字段,如下所示:

typedef struct {
    int new_field;
    int old_field;
} mylib_mystruct;

并重建库,但不重建 main.out,那么断言就会失败!

这是因为行:

myobject->old_field == 1

已生成尝试访问结构的第一个 int 的程序集,现在是 new_field 而不是预期的 old_field代码>.

因此这个改变打破了ABI。

但是,如果我们在 old_field 之后添加 new_field

typedef struct {
    int old_field;
    int new_field;
} mylib_mystruct;

那么旧生成的程序集仍然访问结构体的第一个 int,并且程序仍然可以运行,因为我们保持了 ABI 的稳定。

以下是 GitHub 上此示例的全自动版本

保持此 ABI 稳定的另一种方法是将 mylib_mystruct 视为 不透明结构< /a>,并且仅通过方法助手访问其字段。这使得保持 ABI 稳定变得更容易,但会产生性能开销,因为我们会进行更多函数调用。

API 与 ABI

在前面的示例中,值得注意的是,在 old_field 之前添加 new_field 只会破坏 ABI,但不会破坏 API 。

这意味着,如果我们针对该库重新编译了 main.c 程序,那么它无论如何都会起作用。

但是,如果我们更改了函数签名,我们也会破坏 API:

mylib_mystruct* mylib_init(int old_field, int new_field);

因为在这种情况下,main.c 将完全停止编译。

语义 API 与编程 API

我们还可以将 API 更改分为第三种类型:语义更改。

语义 API 通常是 API 应该做什么的自然语言描述,通常包含在 API 文档中。

因此,可以在不破坏程序构建本身的情况下破坏语义 API。

例如,如果我们修改

myobject->old_field = old_field;

为:

myobject->old_field = old_field + 1;

那么这既不会破坏编程 API,也不会破坏 ABI,但 main.c 语义 API 会破坏。

有两种方法可以以编程方式检查合约 API:

破坏 C 的所有内容的列表/ C++ 共享库 ABI

TODO:查找/创建最终列表:

Java 最小可运行示例

什么是 Java 中的二进制兼容性?

在 Ubuntu 18.10、GCC 8.2.0 中测试。

Linux shared library minimal runnable ABI example

In the context of shared libraries, the most important implication of "having a stable ABI" is that you don't need to recompile your programs after the library changes.

So for example:

  • if you are selling a shared library, you save your users the annoyance of recompiling everything that depends on your library for every new release

  • if you are selling closed source program that depends on a shared library present in the user's distribution, you could release and test less prebuilts if you are certain that ABI is stable across certain versions of the target OS.

    This is specially important in the case of the C standard library, which many many programs in your system link to.

Now I want to provide a minimal concrete runnable example of this.

main.c

#include <assert.h>
#include <stdlib.h>

#include "mylib.h"

int main(void) {
    mylib_mystruct *myobject = mylib_init(1);
    assert(myobject->old_field == 1);
    free(myobject);
    return EXIT_SUCCESS;
}

mylib.c

#include <stdlib.h>

#include "mylib.h"

mylib_mystruct* mylib_init(int old_field) {
    mylib_mystruct *myobject;
    myobject = malloc(sizeof(mylib_mystruct));
    myobject->old_field = old_field;
    return myobject;
}

mylib.h

#ifndef MYLIB_H
#define MYLIB_H

typedef struct {
    int old_field;
} mylib_mystruct;

mylib_mystruct* mylib_init(int old_field);

#endif

Compiles and runs fine with:

cc='gcc -pedantic-errors -std=c89 -Wall -Wextra'
$cc -fPIC -c -o mylib.o mylib.c
$cc -L . -shared -o libmylib.so mylib.o
$cc -L . -o main.out main.c -lmylib
LD_LIBRARY_PATH=. ./main.out

Now, suppose that for v2 of the library, we want to add a new field to mylib_mystruct called new_field.

If we added the field before old_field as in:

typedef struct {
    int new_field;
    int old_field;
} mylib_mystruct;

and rebuilt the library but not main.out, then the assert fails!

This is because the line:

myobject->old_field == 1

had generated assembly that is trying to access the very first int of the struct, which is now new_field instead of the expected old_field.

Therefore this change broke the ABI.

If, however, we add new_field after old_field:

typedef struct {
    int old_field;
    int new_field;
} mylib_mystruct;

then the old generated assembly still accesses the first int of the struct, and the program still works, because we kept the ABI stable.

Here is a fully automated version of this example on GitHub.

Another way to keep this ABI stable would have been to treat mylib_mystruct as an opaque struct, and only access its fields through method helpers. This makes it easier to keep the ABI stable, but would incur a performance overhead as we'd do more function calls.

API vs ABI

In the previous example, it is interesting to note that adding the new_field before old_field, only broke the ABI, but not the API.

What this means, is that if we had recompiled our main.c program against the library, it would have worked regardless.

We would also have broken the API however if we had changed for example the function signature:

mylib_mystruct* mylib_init(int old_field, int new_field);

since in that case, main.c would stop compiling altogether.

Semantic API vs Programming API

We can also classify API changes in a third type: semantic changes.

The semantic API, is usually a natural language description of what the API is supposed to do, usually included in the API documentation.

It is therefore possible to break the semantic API without breaking the program build itself.

For example, if we had modified

myobject->old_field = old_field;

to:

myobject->old_field = old_field + 1;

then this would have broken neither programming API, nor ABI, but main.c the semantic API would break.

There are two ways to programmatically check the contract API:

List of everything that breaks C / C++ shared library ABIs

TODO: find / create the ultimate list:

Java minimal runnable example

What is binary compatibility in Java?

Tested in Ubuntu 18.10, GCC 8.2.0.

情深已缘浅 2024-08-26 17:52:42

应用程序二进制接口 (ABI) 类似于 API,但调用者无法在源代码级别访问该函数。只有二​​进制表示形式是可访问/可用的。

ABI 可以在处理器架构级别或操作系统级别定义。
ABI 是编译器的代码生成器阶段要遵循的标准。该标准由操作系统或处理器确定。

功能:定义机制/标准以使函数调用独立于实现语言或特定编译器/链接器/工具链。提供允许JNI或Python-C接口等的机制。

现有实体:机器代码形式的函数。

使用者:另一个函数(包括使用另一种语言、由另一个编译器编译或由另一个链接器链接的函数)。

An application binary interface (ABI) is similar to an API, but the function is not accessible to the caller at source code level. Only a binary representation is accessible/available.

ABIs may be defined at the processor-architecture level or at the OS level.
The ABIs are standards to be followed by the code-generator phase of the compiler. The standard is fixed either by the OS or by the processor.

Functionality: Define the mechanism/standard to make function calls independent of the implementation language or a specific compiler/linker/toolchain. Provide the mechanism which allows JNI, or a Python-C interface, etc.

Existing entities: Functions in machine code form.

Consumer: Another function (including one in another language, compiled by another compiler, or linked by another linker).

薄情伤 2024-08-26 17:52:42

让我至少回答你问题的一部分。通过一个示例说明 Linux ABI 如何影响系统调用,以及为什么它有用。

系统调用是用户空间程序向内核空间请求某些信息的一种方式。它的工作原理是将调用的数字代码和参数放入某个寄存器中并触发中断。然后切换到内核空间,内核查找数字代码和参数,处理请求,将结果放回寄存器并触发切换回用户空间。例如,当应用程序想要分配内存或打开文件(系统调用“brk”和“open”)时,就需要这样做。

现在系统调用具有短名称“brk”等以及相应的操作码,这些是在系统特定的头文件中定义的。只要这些操作码保持不变,您就可以使用不同的更新内核运行相同的已编译用户态程序,而无需重新编译。因此,您有一个由预编译二进制文件使用的接口,即 ABI。

Let me at least answer a part of your question. With an example of how the Linux ABI affects the systemcalls, and why that is usefull.

A systemcall is a way for a userspace program to ask the kernelspace for something. It works by putting the numeric code for the call and the argument in a certain register and triggering an interrupt. Than a switch occurs to kernelspace and the kernel looks up the numeric code and the argument, handles the request, puts the result back into a register and triggers a switch back to userspace. This is needed for example when the application wants to allocate memory or open a file (syscalls "brk" and "open").

Now the syscalls have short names "brk", etc. and corresponding opcodes, these are defined in a system specific header file. As long as these opcodes stay the same you can run the same compiled userland programs with different updated kernels without having to recompile. So you have an interface used by precompiled binarys, hence ABI.

莳間冲淡了誓言ζ 2024-08-26 17:52:42

功能:一组影响编译器、程序集编写器、链接器和操作系统的契约。合约指定了函数如何布局、参数在哪里传递、参数如何传递、函数返回如何工作。这些通常特定于(处理器架构、操作系统)元组。

现有实体:参数布局、函数语义、寄存器分配。例如,ARM 架构有许多 ABI(APCS、EABI、GNU-EABI,更不用说一堆历史案例)——使用混合 ABI 将导致您的代码在跨边界调用时根本无法工作。

消费者:编译器、汇编编写器、操作系统、CPU 特定架构。

谁需要这些细节?编译器、程序集编写器、执行代码生成(或对齐要求)的链接器、操作系统(中断处理、系统调用接口)。如果您进行汇编编程,那么您就符合 ABI!

C++ 名称修饰是一种特殊情况 - 它是一个以链接器和动态链接器为中心的问题 - 如果名称修饰未标准化,则动态链接将不起作用。从此以后,C++ ABI 简称为 C++ ABI。这不是链接器级别的问题,而是代码生成问题。一旦有了 C++ 二进制文件,就不可能在不从源代码重新编译的情况下使其与另一个 C++ ABI(名称修改、异常处理)兼容。

ELF 是一种供加载器和动态链接器使用的文件格式。 ELF 是二进制代码和数据的容器格式,因此指定一段代码的 ABI。我不认为 ELF 是严格意义上的 ABI,因为 PE 可执行文件不是 ABI。

所有 ABI 都是特定于指令集的。 ARM ABI 在 MSP430 或 x86_64 处理器上没有意义。

Windows 有多个 ABI - 例如,fastcall 和 stdcall 是两种常用的 ABI。系统调用 ABI 又不同了。

Functionality: A set of contracts which affect the compiler, assembly writers, the linker, and the operating system. The contracts specify how functions are laid out, where parameters are passed, how parameters are passed, how function returns work. These are generally specific to a (processor architecture, operating system) tuple.

Existing entities: parameter layout, function semantics, register allocation. For instance, the ARM architectures has numerous ABIs (APCS, EABI, GNU-EABI, never mind a bunch of historical cases) - using the a mixed ABI will result in your code simply not working when calling across boundaries.

Consumer: The compiler, assembly writers, operating system, CPU specific architecture.

Who needs these details? The compiler, assembly writers, linkers which do code generation (or alignment requirements), operating system (interrupt handling, syscall interface). If you did assembly programming, you were conforming to an ABI!

C++ name mangling is a special case - its a linker and dynamic linker centered issue - if name mangling is not standardized, then dynamic linking will not work. Henceforth, the C++ ABI is called just that, the C++ ABI. It is not a linker level issue, but instead a code generation issue. Once you have a C++ binary, it is not possible to make it compatible with another C++ ABI (name mangling, exception handling) without recompiling from source.

ELF is a file format for the use of a loader and dynamic linker. ELF is a container format for binary code and data, and as such specifies the ABI of a piece of code. I would not consider ELF to be an ABI in the strict sense, as PE executables are not an ABI.

All ABIs are instruction set specific. An ARM ABI will not make sense on an MSP430 or x86_64 processor.

Windows has several ABIs - for instance, fastcall and stdcall are two common use ABIs. The syscall ABI is different again.

随梦而飞# 2024-08-26 17:52:42

总结

对于定义 ABI(应用程序二进制接口)的确切层有各种解释和强烈的意见。

在我看来,ABI 是对特定 API 的给定/平台的主观约定。 ABI 是特定 API“不会改变”或将由运行时环境解决的“其余”约定:执行器、工具、链接器、编译器、jvm 和操作系统。

定义接口:ABI、API

如果您想使用像 joda-time 这样的库,您必须声明对 joda-time-..的依赖关系。补丁>.jar。该库遵循最佳实践并使用语义版本控制。这定义了三个级别的 API 兼容性:

  1. 补丁 - 您不需要更改所有代码。该库只是修复了一些错误。
  2. 次要 - 您不需要更改代码,因为添加了内容(尊重开放封闭原则)。
  3. 主要 - 接口(API)已更改,您可能需要更改代码。

为了让您使用同一库的新主要版本,仍然需要遵守许多其他约定:

  • 用于库的二进制语言(在 Java 情况下,定义 Java 字节码的 JVM 目标版本)
  • 调用约定
  • JVM 约定
  • 链接约定
  • 运行时约定
    所有这些都是由我们使用的工具定义和管理的。

示例

Java 案例研究

例如,Java 标准化了所有这些约定,不是在工具中,而是在正式的 JVM 规范中。该规范允许其他供应商提供一组不同的工具来输出兼容的库。

Java 为 ABI 提供了另外两个有趣的案例研究:Scala 版本和 Dalvik 虚拟机。

Dalvik 虚拟机打破了 ABI

Dalvik VM 需要与 Java 字节码不同类型的字节码。 Dalvik 库是通过将 Java 字节码(具有相同的 API)转换为 Dalvik 获得的。通过这种方式,您可以获得同一 API 的两个版本:由原始 joda-time-1.7.2.jar 定义。我们可以将其命名为 joda-time-1.7.2.jarjoda-time-1.7.2-dalvik.jar。他们使用不同的 ABI,用于面向堆栈的标准 Java 虚拟机:Oracle 的、IBM 的、开放 Java 或任何其他;第二个 ABI 是 Dalvik 周围的 ABI。

Scala 连续版本不兼容

Scala 在较小的 Scala 版本之间不具有二进制兼容性: 2.X 。因此,相同的 API "io.reactivex" %% "rxscala" % "0.26.5" 有三个版本(将来会有更多):Scala 2.10、2.11 和 2.12。改变了什么? 我现在不知道,但二进制文件不兼容。最新版本可能添加了一些使库在旧虚拟机上无法使用的内容,可能与链接/命名/参数约定相关。

Java 连续版本不兼容

Java 与 JVM 的主要版本也存在问题:4、5、6、7、8、9。它们仅提供向后兼容性。 Jvm9 知道如何运行所有其他版本的编译/目标代码(javac 的 -target 选项),而 JVM 4 不知道如何运行针对 JVM 5 的代码。所有这些都是在您拥有一个 joda 库的情况下实现的。由于不同的解决方案,这种不兼容性很快就被忽视了:

  1. 语义版本控制:当库面向更高的 JVM 时,它们通常会更改主要版本。
  2. 使用 JVM 4 作为 ABI,就安全了。
  3. Java 9 添加了有关如何在同一库中包含特定目标 JVM 的字节码的规范。

为什么我从 API 定义开始?

API 和 ABI 只是如何定义兼容性的约定。较低层在大量高级语义方面是通用的。这就是为什么很容易制定一些约定。第一种约定是关于内存对齐、字节编码、调用约定、大端和小端编码等。在它们之上,您可以获得像其他描述的可执行约定、链接约定、中间字节代码,类似于 Java 使用的代码或 GCC 使用的 LLVM IR。第三,您获得有关如何查找库、如何加载它们的约定(请参阅 Java 类加载器)。当你的概念越来越高时,你就会有新的约定,你认为这是理所当然的。这就是为什么他们没有进入语义版本控制。它们在主要版本中是隐式的或折叠的。我们可以使用 --- 修改语义版本控制。这实际上已经发生了:平台已经是 rpmdlljar(JVM 字节码)、war >(jvm+web 服务器)、apk2.11(特定 Scala 版本)等。当您说 APK 时,您已经在谈论 API 的特定 ABI 部分。

API 可以移植到不同的 ABI

抽象的顶层(针对最高 API 编写的源代码可以重新编译/移植到任何其他较低级别的抽象。

假设我有一些 rxscala 的源代码。如果 Scala 工具发生更改,我可以如果 JVM 发生变化,我可以自动从旧机器转换到新机器,而无需考虑高级概念,但如果使用新操作系统创建,移植可能会很困难。可以创建完全不同的汇编代码、

跨语言移植的 API

有些 API 可以以多种语言移植,例如 reactive一般来说,它们定义到特定语言/平台的映射,我认为 API 是用人类语言甚至特定编程语言正式定义的主规范。 ,还有比通常的 ABI 更多的 API。 REST 接口也会发生同样的情况。

Summary

There are various interpretation and strong opinions of the exact layer that define an ABI (application binary interface).

In my view an ABI is a subjective convention of what is considered a given/platform for a specific API. The ABI is the "rest" of conventions that "will not change" for a specific API or that will be addressed by the runtime environment: executors, tools, linkers, compilers, jvm, and OS.

Defining an Interface: ABI, API

If you want to use a library like joda-time you must declare a dependency on joda-time-<major>.<minor>.<patch>.jar. The library follows best practices and use Semantic Versioning. This defines the API compatibility at three levels:

  1. Patch - You don't need to change at all your code. The library just fixes some bugs.
  2. Minor - You don't need to change your code since things where addded (open closed principle was respected)
  3. Major - The interface (API) is changed and you might need to change your code.

In order for you to use a new major release of the same library a lot of other conventions are still to be respected:

  • The binary language used for the libraries (in Java cases the JVM target version that defines the Java bytecode)
  • Calling conventions
  • JVM conventions
  • Linking conventions
  • Runtime conventions
    All these are defined and managed by the tools we use.

Examples

Java case study

For example, Java standardized all these conventions, not in a tool, but in a formal JVM specification. The specification allowed other vendors to provide a different set of tools that can output compatible libraries.

Java provides two other interesting case studies for ABI: Scala versions and Dalvik virtual machine.

Dalvik virtual machine broke the ABI

The Dalvik VM needs a different type of bytecode than the Java bytecode. The Dalvik libraries are obtained by converting the Java bytecode (with same API) for Dalvik. In this way you can get two versions of the same API: defined by the original joda-time-1.7.2.jar. We could call it joda-time-1.7.2.jar and joda-time-1.7.2-dalvik.jar. They use a different ABI one is for the stack-oriented standard Java vms: Oracle's one, IBM's one, open Java or any other; and the second ABI is the one around Dalvik.

Scala successive releases are incompatible

Scala doesn't have binary compatibility between minor Scala versions: 2.X . For this reason the same API "io.reactivex" %% "rxscala" % "0.26.5" has three versions (in the future more): for Scala 2.10, 2.11 and 2.12. What is changed? I don't know for now, but the binaries are not compatible. Probably the latest versions adds things that make the libraries unusable on the old virtual machines, probably things related to linking/naming/parameter conventions.

Java successive releases are incompatible

Java has problems with the major releases of the JVM too: 4,5,6,7,8,9. They offer only backward compatibility. Jvm9 knows how to run code compiled/targeted (javac's -target option) for all other versions, while JVM 4 doesn't know how to run code targeted for JVM 5. All these while you have one joda-library. This incompatibility flies bellow the radar thanks to different solutions:

  1. Semantic versioning: when libraries target higher JVM they usually change the major version.
  2. Use JVM 4 as the ABI, and you're safe.
  3. Java 9 adds a specification on how you can include bytecode for specific targeted JVM in the same library.

Why did I start with the API definition?

API and ABI are just conventions on how you define compatibility. The lower layers are generic in respect of a plethora of high level semantics. That's why it's easy to make some conventions. The first kind of conventions are about memory alignment, byte encoding, calling conventions, big and little endian encodings, etc. On top of them you get the executable conventions like others described, linking conventions, intermediate byte code like the one used by Java or LLVM IR used by GCC. Third you get conventions on how to find libraries, how to load them (see Java classloaders). As you go higher and higher in concepts you have new conventions that you consider as a given. That's why they didn't made it to the semantic versioning. They are implicit or collapsed in the major version. We could amend semantic versioning with <major>-<minor>-<patch>-<platform/ABI>. This is what is actually happening already: platform is already a rpm, dll, jar (JVM bytecode), war(jvm+web server), apk, 2.11 (specific Scala version) and so on. When you say APK you already talk about a specific ABI part of your API.

API can be ported to different ABI

The top level of an abstraction (the sources written against the highest API can be recompiled/ported to any other lower level abstraction.

Let's say I have some sources for rxscala. If the Scala tools are changed I can recompile them to that. If the JVM changes I could have automatic conversions from the old machine to the new one without bothering with the high level concepts. While porting might be difficult will help any other client. If a new operating system is created using a totally different assembler code a translator can be created.

APIs ported across languages

There are APIs that are ported in multiple languages like reactive streams. In general they define mappings to specific languages/platforms. I would argue that the API is the master specification formally defined in human language or even a specific programming language. All the other "mappings" are ABI in a sense, else more API than the usual ABI. The same is happening with the REST interfaces.

情泪▽动烟 2024-08-26 17:52:42

为了调用共享库中的代码或在编译单元之间调用代码,目标文件需要包含调用的标签。 C++ 会破坏方法标签的名称,以强制数据隐藏并允许重载方法。这就是为什么您不能混合来自不同 C++ 编译器的文件,除非它们明确支持相同的 ABI。

In order to call code in shared libraries, or call code between compilation units, the object file needs to contain labels for the calls. C++ mangles the names of method labels in order to enforce data hiding and allow for overloaded methods. That is why you cannot mix files from different C++ compilers unless they explicitly support the same ABI.

回梦 2024-08-26 17:52:42

区分 ABI 和 API 的最佳方法是了解其原因和用途:

对于 x86-64,通常有一个 ABI(对于 x86 32 位,则有另一组):

http://www.x86-64.org/documentation/abi.pdf

https://developer.apple.com/library/mac/documentation /DeveloperTools/Conceptual/LowLevelABI/140-x86-64_Function_Calling_Conventions/x86_64.html

http://people.freebsd.org/~obrien/amd64-elf-abi.pdf

Linux + FreeBSD + MacOSX 遵循它,但有一些细微的变化。 Windows x64 有自己的 ABI:

http: //eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/

了解 ABI 并假设其他编译器也遵循它,那么二进制文件理论上就知道如何调用彼此(特别是库 API)并通过堆栈或寄存器等传递参数。或者在调用函数等时将更改哪些寄存器。本质上,这些知识将帮助软件相互集成。知道寄存器/堆栈布局的顺序,我可以轻松地将用程序集编写的不同软件拼凑在一起,没有太大问题。

但 API 是不同的:

它是一个高级函数名称,定义了参数,这样如果使用这些 API 构建不同的软件块,就可以相互调用。但必须遵守 SAME ABI 的附加要求。

例如,Windows 曾经兼容 POSIX API:

https://en.wikipedia.org/wiki/Windows_Services_for_UNIX

https://en。 wikipedia.org/wiki/POSIX

Linux 也兼容 POSIX。但二进制文件不能直接移动并立即运行。但由于它们在 POSIX 兼容 API 中使用了相同的名称,因此您可以使用 C 语言编写相同的软件,在不同的操作系统中重新编译它,并立即使其运行。

API 旨在简化软件集成- 预编译阶段。因此,编译后,如果 ABI 不同,软件看起来可能完全不同。

ABI 旨在定义二进制/汇编级别的软件的精确集成。

The best way to differentiate between ABI and API is to know why and what is it used for:

For x86-64 there is generally one ABI (and for x86 32-bit there is another set):

http://www.x86-64.org/documentation/abi.pdf

https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/LowLevelABI/140-x86-64_Function_Calling_Conventions/x86_64.html

http://people.freebsd.org/~obrien/amd64-elf-abi.pdf

Linux + FreeBSD + MacOSX follow it with some slight variations. And Windows x64 have its own ABI:

http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/

Knowing the ABI and assuming other compiler follows it as well, then the binaries theoretically know how to call each other (libraries API in particular) and pass parameters over the stack or by registers etc. Or what registers will be changed upon calling the functions etc. Essentially these knowledge will help software to integrate with one another. Knowing the order of the registers / stack layout I can easily piece together different software written in assemblies together without much problem.

But API are different:

It is a high level functions names, with argument defined, such that if different software pieces build using these API, MAY be able to call into one another. But an additional requirement of SAME ABI must be adhered to.

For example, Windows used to be POSIX API compliant:

https://en.wikipedia.org/wiki/Windows_Services_for_UNIX

https://en.wikipedia.org/wiki/POSIX

And Linux is POSIX compliant as well. But the binaries cannot be just moved over and run immediately. But because they used the same NAMES in the POSIX compliant API, you can take the same software in C, recompile it in the different OS, and immediately get it running.

API are meant to ease integration of software - pre-compilation stage. So after compilation the software can look totally different - if the ABI are different.

ABI are meant to define exact integration of software at the binary / assembly level.

怼怹恏 2024-08-26 17:52:42

术语 ABI 用于指代两个不同但相关的概念。

当谈论编译器时,它指的是用于从源代码级构造转换为二进制构造的规则。数据类型有多大?堆栈如何工作?如何将参数传递给函数?调用者和被调用者应该保存哪些寄存器?

当谈论库时,它指的是编译库提供的二进制接口。该接口是多种因素的结果,包括库的源代码、编译器使用的规则以及在某些情况下从其他库获取的定义。

对库的更改可以在不破坏 API 的情况下破坏 ABI。例如,考虑一个具有类似接口的库。

void initfoo(FOO * foo)
int usefoo(FOO * foo, int bar)
void cleanupfoo(FOO * foo)

应用程序程序员编写的代码如下:

int dostuffwithfoo(int bar) {
  FOO foo;
  initfoo(&foo);
  int result = usefoo(&foo,bar)
  cleanupfoo(&foo);
  return result;
}

应用程序程序员不关心 FOO 的大小或布局,但应用程序二进制文件最终以硬编码的 foo 大小结束。如果库程序员向 foo 添加了一个额外的字段,并且有人将新的库二进制文件与旧的应用程序二进制文件一起使用,则该库可能会进行越界内存访问。

OTOH 如果库作者设计了他们的 API 的话。

FOO * newfoo(void)
int usefoo(FOO * foo, int bar)
void deletefoo((FOO * foo, int bar))

应用程序程序员编写如下代码

int dostuffwithfoo(int bar) {
  FOO * foo;
  foo = newfoo();
  int result = usefoo(foo,bar)
  deletefoo(foo);
  return result;
}

然后应用程序二进制文件不需要了解有关 FOO 结构的任何信息,这些都可以隐藏在库中。但为此付出的代价是涉及堆操作。

The term ABI is used to refer to two distinct but related concepts.

When talking about compilers it refers to the rules used to translate from source-level constructs to binary constructs. How big are the data types? how does the stack work? how do I pass parameters to functions? which registers should be saved by the caller vs the callee?

When talking about libraries it refers to the binary interface presented by a compiled library. This interface is the result of a number of factors including the source code of the library, the rules used by the compiler and in some cases definitions picked up from other libraries.

Changes to a library can break the ABI without breaking the API. Consider for example a library with an interface like.

void initfoo(FOO * foo)
int usefoo(FOO * foo, int bar)
void cleanupfoo(FOO * foo)

and the application programmer writes code like

int dostuffwithfoo(int bar) {
  FOO foo;
  initfoo(&foo);
  int result = usefoo(&foo,bar)
  cleanupfoo(&foo);
  return result;
}

The application programmer doesn't care about the size or layout of FOO, but the application binary ends up with a hardcoded size of foo. If the library programmer adds an extra field to foo and someone uses the new library binary with the old application binary then the library may make out of bounds memory accesses.

OTOH if the library author had designed their API like.

FOO * newfoo(void)
int usefoo(FOO * foo, int bar)
void deletefoo((FOO * foo, int bar))

and the application programmer writes code like

int dostuffwithfoo(int bar) {
  FOO * foo;
  foo = newfoo();
  int result = usefoo(foo,bar)
  deletefoo(foo);
  return result;
}

Then the application binary does not need to know anything about the structure of FOO, that can all be hidden inside the library. The price you pay for that though is that heap operations are involved.

野心澎湃 2024-08-26 17:52:42

调用者和被调用者之间的 ABI 需要保持一致,以确保调用成功。堆栈使用、寄存器使用、例程结束堆栈弹出。所有这些都是 ABI 最重要的部分。

The ABI needs to be consistent between caller and callee to be certain that the call succeeds. Stack use, register use, end-of-routine stack pop. All these are the most important parts of the ABI.

梦纸 2024-08-26 17:52:42

应用程序二进制接口(ABI)

ABI - 应用程序二进制接口是关于两个二进制文件之间运行时的机器代码通信部分如应用程序、库、操作系统...ABI描述了对象如何在内存中保存,如何调用函数(调用约定), mangling...

API 和 ABI 的一个很好的例子是 使用 Swift 语言的 iOS 生态系统 从 v5 开始。

  • 应用程序层 - 当您使用不同语言创建应用程序时。例如,您可以使用 SwiftObjective-C[混合 Swift 和 Objective-C]

  • 应用程序 - 操作系统层 - 运行时 - Swift 标准库Swift 运行时库[关于]是操作系统的一部分,它们不应该< /strong> 包含在每个捆绑包中(例如应用程序、框架)。它与 Objective-C 的用法相同。从 iOS v12.2 开始可用


  • 库层 - 模块稳定性案例 - 编译时间 - 您将能够导入< /strong> 使用另一个版本的 Swift 编译器构建的框架。这意味着创建闭源(预构建)二进制文件是安全的,该二进制文件将由不同版本的编译器使用( .swiftinterface.swiftmodule 一起使用[关于]),你不会得到

    用_编译的模块无法被_编译器导入
    //或者
    编译模块是由较新版本的编译器创建的
    

  • Library layer - Library Evolution case

    1. 编译时 - 如果依赖项发生更改,则客户端不必更改
      重新编译。
    2. 运行时 - 系统库或动态框架可以
      可以热插拔新的。

[API 与 ABI]
[Swift 模块稳定性和库稳定性]

Application Binary Interface(ABI)

ABI - Application Binary Interface is about a machine code communication in runtime between two binary parts like - application, library, OS... ABI describes how objects are saved in memory, how functions are called(calling convention), mangling...

A good example of API and ABI is iOS ecosystem with Swift language from v5.

  • Application layer - When you create an application using different languages. For example you can create application using Swift and Objective-C[Mixing Swift and Objective-C]

  • Application - OS layer - runtime - Swift Standard Library and Swift Run Time Library[About] are parts of OS and they should not be included into each bundle(e.g. app, framework). It is the same as like Objective-C uses. Available from iOS v12.2

  • Library layer - Module Stability case - compile time - you will be able to import a framework which was built with another version of Swift's compiler. It means that it is safety to create a closed-source(pre-build) binary which will be consumed by a different version of compiler( .swiftinterface is used with .swiftmodule[About]) and you will not get

    Module compiled with _ cannot be imported by the _ compiler
    //or
    Compiled module was created by a newer version of the compiler
    
  • Library layer - Library Evolution case

    1. Compile time - if a dependency was changed, a client has not to be
      recompiled.
    2. Runtime - a system library or a dynamic framework can
      be hot-swapped by a new one.

[API vs ABI]
[Swift Module Stability and Library Stability]

离不开的别离 2024-08-26 17:52:42

我也试图理解 ABI,JesperE 的回答非常有帮助。

从一个非常简单的角度来看,我们可以尝试通过考虑二进制兼容性来理解ABI。

KDE wiki 将库定义为二进制兼容的“如果动态链接到该库的旧版本的程序可以继续使用该库的新版本运行,而无需重新编译”。有关动态链接的更多信息,请参阅静态链接与动态链接

现在,让我们尝试看看库实现二进制兼容性所需的最基本方面(假设库的源代码没有更改):

  1. 相同/向后兼容的指令集架构(处理器指令、寄存器文件结构、堆栈组织、内存访问类型等)处理器可以直接访问的基本数据类型的大小、布局和对齐)
  2. 相同的调用约定
  3. 相同的名称修改约定(如果 Fortran 程序需要调用某些 C++ 库函数,则可能需要这样做)。

当然,还有许多其他细节,但这主要是 ABI 所涵盖的内容。

更具体地回答你的问题,从上面我们可以推断出:

ABI 功能:二进制兼容性

现有实体:现有程序/库/操作系统

消费者:库、操作系统

希望这有帮助!

I was also trying to understand ABI and JesperE’s answer was very helpful.

From a very simple perspective, we may try to understand ABI by considering binary compatibility.

KDE wiki defines a library as binary compatible “if a program linked dynamically to a former version of the library continues running with newer versions of the library without the need to recompile.” For more on dynamic linking, refer Static linking vs dynamic linking

Now, let’s try to look at just the most basic aspects needed for a library to be binary compatibility (assuming there are no source code changes to the library):

  1. Same/backward compatible instruction set architecture (processor instructions, register file structure, stack organization, memory access types, along with sizes, layout, and alignment of basic data types the processor can directly access)
  2. Same calling conventions
  3. Same name mangling convention (this might be needed if say a Fortran program needs to call some C++ library function).

Sure, there are many other details but this is mostly what the ABI also covers.

More specifically to answer your question, from the above, we can deduce:

ABI functionality: binary compatibility

existing entities: existing program/libraries/OS

consumer: libraries, OS

Hope this helps!

人生百味 2024-08-26 17:52:42

应用程序二进制接口(ABI)

功能:

  • 从程序员的模型到底层系统的域数据的转换
    类型、大小、对齐方式、调用约定,控制如何
    传递函数的参数并检索返回值;这
    系统调用号以及应用程序应如何进行系统调用
    到操作系统;高级语言编译器的名称
    修改方案、异常传播和调用约定
    同一平台上的编译器之间,但不需要
    跨平台兼容性...

现有实体

  • 直接参与程序执行的逻辑块:ALU,
    通用寄存器、内存寄存器/I/O 的 I/O 映射等...

消费者:

  • 语言处理器链接器、汇编器...

任何人都需要确保构建工具 -链条作为一个整体发挥作用。如果您用汇编语言编写一个模块,用 Python 编写另一个模块,并且希望使用操作系统而不是您自己的引导加载程序,那么您的“应用程序”模块将跨“二进制”边界工作,并且需要此类“接口”的协议。

C++ 名称修饰,因为可能需要在应用程序中链接来自不同高级语言的目标文件。考虑使用 GCC 标准库对使用 Visual C++ 构建的 Windows 进行系统调用。

ELF 是链接器对目标文件进行解释的一种可能期望,尽管 JVM 可能有其他想法。

对于 Windows RT Store 应用程序,如果您确实希望使某些构建工具链协同工作,请尝试搜索 ARM ABI。

Application binary interface (ABI)

Functionality:

  • Translation from the programmer's model to the underlying system's domain data
    type, size, alignment, the calling convention, which controls how
    functions' arguments are passed and return values retrieved; the
    system call numbers and how an application should make system calls
    to the operating system; the high-level language compilers' name
    mangling scheme, exception propagation, and calling convention
    between compilers on the same platform, but do not require
    cross-platform compatibility...

Existing entities:

  • Logical blocks that directly participate in program's execution: ALU,
    general purpose registers, registers for memory/ I/O mapping of I/O, etc...

consumer:

  • Language processors linker, assembler...

These are needed by whoever has to ensure that build tool-chains work as a whole. If you write one module in assembly language, another in Python, and instead of your own boot-loader want to use an operating system, then your "application" modules are working across "binary" boundaries and require agreement of such "interface".

C++ name mangling because object files from different high-level languages might be required to be linked in your application. Consider using GCC standard library making system calls to Windows built with Visual C++.

ELF is one possible expectation of the linker from an object file for interpretation, though JVM might have some other idea.

For a Windows RT Store app, try searching for ARM ABI if you really wish to make some build tool-chain work together.

挽容 2024-08-26 17:52:42

答:简单地说,ABI 与 API 的一个共同点是它是一个接口。可重用的程序公开了一个稳定的接口(API),可用于在另一个程序中重用该程序。

B. 然而,ABI 是为某些特定语言的某些特定处理器平台发布的接口。所有希望针对同一语言的平台的编译器供应商都必须确保,不仅可重定位目标代码形式的编译代码符合能够相互链接和交叉链接的接口,而且可执行文件也符合该接口能够在平台上运行。因此,ABI 是比典型函数 API 更广泛的规范/标准集。它可能包括一些由编译器强制执行给语言用户的 API 对象。编译器供应商必须在其发行版中包含对相同内容的支持。不用说,平台供应商是为其平台发布 ABI 的合法权威。编译器供应商和 ABI 都需要遵守相应的语言标准(例如 C++ 的 ISO 标准)。

C. 平台供应商对 ABI 的定义是:

“1.可执行文件必须符合的规范才能在特定执行环境中执行,例如,Arm 架构的 Linux ABI

  1. 独立生成的可重定位文件必须符合的规范的特定方面 。例如,Arm 架构的 C++ ABI、Arm 架构的运行时 ABI、Arm 架构的 C 库 ABI。”

D、例如。基于安腾架构的 C++ 通用 ABI 也已由财团发行。平台供应商自己的 C++ ABI 遵守的程度完全取决于平台供应商。

E. 再举一个例子。适用于 Arm 架构的 C++ ABI 位于此处

F. 话虽如此,在底层,处理器架构的 ABI 将确保一个可重用程序和另一个重用它的程序之间的 API 适用于该处理器架构。

G. 这给我们带来了面向服务的组件(例如基于 SOAP 的 Web 服务)。它们也需要在基于 SOAP 的 Web 服务和客户端程序(可以是应用程序、前端或其他 Web 服务)之间存在 API,以便客户端程序重用 Web 服务。API 用标准化协议来描述类似于 WSDL(接口描述)和 SOAP(消息格式),并且是语言中立和平台中立的。它不针对任何特定的处理器平台,因此它不像 ABI 那样“二进制”。任何一种平台类型上并以任何语言编写的客户端程序都可以远程重用以任何其他语言编写并托管在完全不同的处理器平台上的 Web 服务。这是因为 WSDL 和 SOAP 都是基于文本 (XML) 的协议。对于 RESTful Web 服务,传输协议 http(也是基于文本的协议)本身充当 API(CRUD 方法)。

A. Plainly speaking, one common thing an ABI has with an API is that it is an interface. A reusable program exposes a stable interface (API) that can be used to reuse the program in another program.

B. However, an ABI is an interface issued for some specific processor-platform for some specific language. All compiler-vendors desiring to target that platform for that same language will have to ensure that not only compiled code in form of relocatable object codes comply with the interface to be able to link and cross-link with each other but also executables comply with it to be able to run on the platform at all. So, ABI is much broader set of specifications/standard than a typical function API. It may include some API objects to be enforced upon the language-users by the compiler. The compiler-vendor will have to include support for the same in their distributions. Needless to say, the platform vendor is the rightful authority to issue ABIs for its platform. Both compiler vendors and ABIs need to comply with the corresponding language-standard (e.g. ISO standard for C++).

C. A definition of an ABI by a platform vendor is:

"1. The specifications to which an executable must conform in order to execute in a specific execution environment. For example, the Linux ABI for the Arm Architecture.

  1. A particular aspect of the specifications to which independently produced relocatable files must conform in order to be statically linkable and executable. For example, the C++ ABI for the Arm Architecture, the Run-time ABI for the Arm Architecture, the C Library ABI for the Arm Architecture."

D. For example. A generic ABI for C++ based on Itanium architecture has also been issued by a consortium. The extent to which platform-vendors's own ABIs for C++ comply with it is entirely up to the platform vendors.

E. As another example. The C++ ABI for Arm Architecture is here.

F. Having said that, under the hood, it is the ABI of a processor-architecture that will ensure that the API between one reusable program and another program that reuses it works for that processor-architecture.

G. That brings us to service-oriented components (e.g. SOAP-based web services). They too require an API to exist between a SOAP-based web service and client program (could be an app, front-end or another web service) for the client program to reuse the web service.The API is described in terms of standardized protocols like WSDL (interface description) and SOAP(message format) and is language-neutral and platform-neutral. It is not targeted to any specific processor-platform and thus it is not "binary" like ABI. A client-program on any one platform type and written in any language can remotely reuse a web service written in any other language and hosted on an entirely different processor-platform. This is made possible by the fact that both WSDL and SOAP are text-based (XML) protocols. In case of RESTful web services, the transport protocol http--also a text-based protocol-- itself acts as the API (CRUD methods).

記憶穿過時間隧道 2024-08-26 17:52:42

简而言之,从哲学上来说,只有同类的事物才能相处融洽,而 ABI 可以被视为同类软件之间协同工作的方式。

In short and in philosophy, only things of a kind can get along well, and the ABI could be seen as the kind of which software stuff work together.

甜心 2024-08-26 17:52:41

理解“ABI”的一种简单方法是将其与“API”进行比较。

您已经熟悉 API 的概念。如果您想使用某些库或操作系统的功能,您将针对 API 进行编程。 API 由数据类型/结构、常量、函数等组成,您可以在代码中使用它们来访问该外部组件的功能。

ABI 非常相似。将其视为 API 的编译版本(或机器语言级别的 API)。当您编写源代码时,您可以通过 API 访问该库。代码编译完成后,您的应用程序将通过 ABI 访问库中的二进制数据。 ABI 仅在较低级别定义了编译的应用程序将用于访问外部库的结构和方法(就像 API 所做的那样)。您的 API 定义了将参数传递给函数的顺序。您的 ABI 定义了这些参数如何传递的机制(寄存器、堆栈等)。您的 API 定义了哪些函数属于您的库的一部分。您的 ABI 定义了代码如何存储在库文件中,以便使用您的库的任何程序都可以找到所需的函数并执行它。

对于使用外部库的应用程序来说,ABI 非常重要。库充满了代码和其他资源,但您的程序必须知道如何在库文件中找到它需要的内容。您的 ABI 定义了库的内容如何存储在文件中,并且您的程序使用 ABI 来搜索文件并找到它需要的内容。如果系统中的所有内容都符合相同的 ABI,那么任何程序都可以使用任何库文件,无论是谁创建的。 Linux 和 Windows 使用不同的 ABI,因此 Windows 程序不知道如何访问为 Linux 编译的库。

有时,ABI 更改是不可避免的。发生这种情况时,任何使用该库的程序都将无法运行,除非重新编译它们以使用该库的新版本。如果 ABI 发生变化但 API 没有变化,则新旧库版本有时被称为“源兼容”。这意味着,虽然为一个库版本编译的程序无法与另一个库版本一起使用,但为一个库版本编写的源代码如果重新编译,将适用于另一个版本。

因此,开发人员倾向于尝试保持 ABI 稳定(以尽量减少干扰)。保持 ABI 稳定意味着不更改函数接口(返回类型和数量、类型和参数顺序)、数据类型或数据结构的定义、定义的常量等。可以添加新函数和数据类型,但必须保留现有函数和数据类型相同。例如,如果您的库使用 32 位整数来指示函数的偏移量,而您切换到 64 位整数,则使用该库的已编译代码将无法正确访问该字段(或其后面的任何字段) 。访问数据结构成员在编译期间被转换为内存地址和偏移量,如果数据结构发生变化,那么这些偏移量将不会指向代码期望它们指向的内容,并且结果充其量是不可预测的。

ABI 不一定是您明确提供的东西,除非您正在进行非常低级的系统设计工作。它也不是特定于语言的,因为(例如)C 应用程序和 Pascal 应用程序在编译后可以使用相同的 ABI。

编辑:关于 SysV ABI 文档中有关 ELF 文件格式的章节的问题:包含此信息的原因是因为 ELF 格式定义了操作系统和应用程序之间的接口。当你告诉操作系统运行一个程序时,它期望该程序以某种方式格式化,并且(例如)期望二进制文件的第一部分是一个 ELF 标头,其中包含特定内存偏移处的某些信息。这就是应用程序将有关其自身的重要信息传递给操作系统的方式。如果您以非 ELF 二进制格式(例如 a.out 或 PE)构建程序,则需要 ELF 格式应用程序的操作系统将无法解释该二进制文件或运行该应用程序。这是 Windows 应用程序无法直接在 Linux 计算机上运行(反之亦然)的一个重要原因,除非重新编译或在某种类型的模拟层(可以从一种二进制格式转换为另一种格式)内运行。

IIRC,Windows 目前使用可移植可执行文件(或 PE)格式。该维基百科页面的“外部链接”部分中有一些链接,其中包含有关 PE 格式的更多信息。

另外,关于 C++ 名称修饰的注释:在库文件中查找函数时,通常会按名称查找该函数。 C++ 允许重载函数名称,因此仅名称不足以识别函数。 C++ 编译器有自己的内部处理方法,称为名称修改。 ABI 可以定义对函数名称进行编码的标准方法,以便使用不同语言或编译器构建的程序可以找到所需的内容。当您使用 extern "c" 在 C++ 程序中,您指示编译器使用其他软件可以理解的标准化方式来记录名称。

One easy way to understand "ABI" is to compare it to "API".

You are already familiar with the concept of an API. If you want to use the features of, say, some library or your OS, you will program against an API. The API consists of data types/structures, constants, functions, etc that you can use in your code to access the functionality of that external component.

An ABI is very similar. Think of it as the compiled version of an API (or as an API on the machine-language level). When you write source code, you access the library through an API. Once the code is compiled, your application accesses the binary data in the library through the ABI. The ABI defines the structures and methods that your compiled application will use to access the external library (just like the API did), only on a lower level. Your API defines the order in which you pass arguments to a function. Your ABI defines the mechanics of how these arguments are passed (registers, stack, etc.). Your API defines which functions are part of your library. Your ABI defines how your code is stored inside the library file, so that any program using your library can locate the desired function and execute it.

ABIs are important when it comes to applications that use external libraries. Libraries are full of code and other resources, but your program has to know how to locate what it needs inside the library file. Your ABI defines how the contents of a library are stored inside the file, and your program uses the ABI to search through the file and find what it needs. If everything in your system conforms to the same ABI, then any program is able to work with any library file, no matter who created them. Linux and Windows use different ABIs, so a Windows program won't know how to access a library compiled for Linux.

Sometimes, ABI changes are unavoidable. When this happens, any programs that use that library will not work unless they are re-compiled to use the new version of the library. If the ABI changes but the API does not, then the old and new library versions are sometimes called "source compatible". This implies that while a program compiled for one library version will not work with the other, source code written for one will work for the other if re-compiled.

For this reason, developers tend to try to keep their ABI stable (to minimize disruption). Keeping an ABI stable means not changing function interfaces (return type and number, types, and order of arguments), definitions of data types or data structures, defined constants, etc. New functions and data types can be added, but existing ones must stay the same. If, for instance, your library uses 32-bit integers to indicate the offset of a function and you switch to 64-bit integers, then already-compiled code that uses that library will not be accessing that field (or any following it) correctly. Accessing data structure members gets converted into memory addresses and offsets during compilation and if the data structure changes, then these offsets will not point to what the code is expecting them to point to and the results are unpredictable at best.

An ABI isn't necessarily something you will explicitly provide unless you are doing very low-level systems design work. It isn't language-specific either, since (for example) a C application and a Pascal application can use the same ABI after they are compiled.

Edit: Regarding your question about the chapters regarding the ELF file format in the SysV ABI docs: The reason this information is included is because the ELF format defines the interface between operating system and application. When you tell the OS to run a program, it expects the program to be formatted in a certain way and (for example) expects the first section of the binary to be an ELF header containing certain information at specific memory offsets. This is how the application communicates important information about itself to the operating system. If you build a program in a non-ELF binary format (such as a.out or PE), then an OS that expects ELF-formatted applications will not be able to interpret the binary file or run the application. This is one big reason why Windows apps cannot be run directly on a Linux machine (or vice versa) without being either re-compiled or run inside some type of emulation layer that can translate from one binary format to another.

IIRC, Windows currently uses the Portable Executable (or, PE) format. There are links in the "external links" section of that Wikipedia page with more information about the PE format.

Also, regarding your note about C++ name mangling: When locating a function in a library file, the function is typically looked up by name. C++ allows you to overload function names, so name alone is not sufficient to identify a function. C++ compilers have their own ways of dealing with this internally, called name mangling. An ABI can define a standard way of encoding the name of a function so that programs built with a different language or compiler can locate what they need. When you use extern "c" in a C++ program, you're instructing the compiler to use a standardized way of recording names that's understandable by other software.

伤痕我心 2024-08-26 17:52:41

如果您了解汇编以及操作系统级别的工作原理,那么您就符合特定的 ABI。 ABI 管理诸如参数如何传递、返回值放置在何处之类的事情。对于许多平台来说,只有一种 ABI 可供选择,在这些情况下,ABI 只是“事情如何运作”。

然而,ABI 还管理诸如如何在 C++ 中布局类/对象之类的事情。如果您希望能够跨模块边界传递对象引用,或者如果您想混合使用不同编译器编译的代码,则这是必要的。

此外,如果您有一个可以执行 32 位二进制文​​件的 64 位操作系统,那么您将拥有针对 32 位和 64 位代码的不同 ABI。

一般来说,链接到同一可执行文件的任何代码都必须符合相同的 ABI。如果要在使用不同 ABI 的代码之间进行通信,则必须使用某种形式的 RPC 或序列化协议。

我认为你太努力地将不同类型的接口压缩成一组固定的特征。例如,接口不一定必须分为消费者和生产者。接口只是两个实体交互的约定。

ABI 可以(部分)与 ISA 无关。某些方面(例如调用约定)取决于 ISA,而其他方面(例如 C++ 类布局)则不依赖于 ISA。

定义良好的 ABI 对于编写编译器的人来说非常重要。如果没有明确定义的 ABI,就不可能生成可互操作的代码。

编辑:一些需要澄清的注释:

  • ABI 中的“二进制”不排除使用字符串或文本。如果要链接导出 C++ 类的 DLL,则必须对其中的方法和类型签名进行编码。这就是 C++ 名称修改的用武之地。
  • 您从未提供 ABI 的原因是绝大多数程序员永远不会这样做。 ABI 是由设计平台(即操作系统)的同一个人提供的,很少有程序员有幸设计出广泛使用的 ABI。

If you know assembly and how things work at the OS-level, you are conforming to a certain ABI. The ABI govern things like how parameters are passed, where return values are placed. For many platforms there is only one ABI to choose from, and in those cases the ABI is just "how things work".

However, the ABI also govern things like how classes/objects are laid out in C++. This is necessary if you want to be able to pass object references across module boundaries or if you want to mix code compiled with different compilers.

Also, if you have an 64-bit OS which can execute 32-bit binaries, you will have different ABIs for 32- and 64-bit code.

In general, any code you link into the same executable must conform to the same ABI. If you want to communicate between code using different ABIs, you must use some form of RPC or serialization protocols.

I think you are trying too hard to squeeze in different types of interfaces into a fixed set of characteristics. For example, an interface doesn't necessarily have to be split into consumers and producers. An interface is just a convention by which two entities interact.

ABIs can be (partially) ISA-agnostic. Some aspects (such as calling conventions) depend on the ISA, while other aspects (such as C++ class layout) do not.

A well defined ABI is very important for people writing compilers. Without a well defined ABI, it would be impossible to generate interoperable code.

EDIT: Some notes to clarify:

  • "Binary" in ABI does not exclude the use of strings or text. If you want to link a DLL exporting a C++ class, somewhere in it the methods and type signatures must be encoded. That's where C++ name-mangling comes in.
  • The reason why you never provided an ABI is that the vast majority of programmers will never do it. ABIs are provided by the same people designing the platform (i.e. operating system), and very few programmers will ever have the privilege to design a widely-used ABI.
淡看悲欢离合 2024-08-26 17:52:41

实际上,您根本不需要 ABI,如果 -

  • 您的程序没有功能,并且 -
  • 您的程序是单独运行的单个可执行文件(即嵌入式系统),它实际上是唯一正在运行的东西,它不需要与其他任何东西交谈。

过于简单的总结:

API: “这是您可以调用的所有函数。”

ABI: “这是如何调用函数。”

ABI 是编译器和链接器遵守的一组规则,以便编译您的程序,使其正常工作。 ABI 涵盖多个主题:

  • 可以说,ABI 中最大和最重要的部分有时是过程调用标准称为“调用约定”。调用约定标准化了“函数”如何转换为汇编代码。
  • ABI 还规定了库中公开函数的名称应如何表示,以便其他代码可以调用这些库并知道应传递哪些参数。这称为“名称修改”。
  • ABI 还规定可以使用什么类型的数据类型、它们必须如何对齐以及其他低级细节。

更深入地研究调用约定,我认为这是 ABI 的核心:

机器本身没有“函数”的概念。当您用 c 等高级语言编写函数时,编译器会生成一行汇编代码,如 _MyFunction1:。这是一个标签,最终会被汇编器解析为地址。该标签标记了汇编代码中“函数”的“开始”。在高级代码中,当您“调用”该函数时,您实际上所做的就是使 CPU跳转到该标签的地址并在那里继续执行。

为了准备跳转,编译器必须做一些重要的事情。调用约定就像编译器执行所有这些操作时遵循的清单:

  • 首先,编译器插入一点汇编代码来保存当前地址,以便当您的“函数”完成时,CPU 可以跳回到到正确的位置并继续执行。
  • 接下来,编译器生成汇编代码来传递参数。
    • 一些调用约定规定参数应放入堆栈(当然是按特定顺序)。
    • 其他约定规定参数应放置在特定的寄存器中(当然取决于它们的数据类型)。
    • 还有其他约定规定应使用堆栈和寄存器的特定组合。
  • 当然,如果这些寄存器之前有任何重要的内容,那么这些值现在会被覆盖并永远丢失,因此某些调用约定可能规定编译器应在将参数放入其中之前保存其中一些寄存器。
  • 现在,编译器插入一条跳转指令,告诉 CPU 转到之前创建的标签 (_MyFunction1:)。此时,您可以认为CPU“在”您的“功能”中。
  • 在函数的末尾,编译器放置一些汇编代码,使 CPU 将返回值写入正确的位置。调用约定将决定返回值是否应放入特定寄存器(取决于其类型)或堆栈中。
  • 现在是清理的时候了。调用约定将指示编译器放置清理汇编代码的位置。
    • 一些约定规定调用者必须清理堆栈。这意味着在“功能”完成并且CPU跳回之前的位置之后,下一个要执行的代码应该是一些非常具体的清理代码。
    • 其他约定规定清理代码的某些特定部分应位于“函数”的末尾跳回之前。

有许多不同的 ABI/调用约定。一些主要的有:

  • 对于 x86 或 x86-64 CPU(32 位环境):
    • CDECL
    • STDCALL
    • 快速呼叫
    • 矢量呼叫
    • 这个电话
  • 对于 x86-64 CPU(64 位环境):
    • 系统V
    • MSNATIVE
    • 矢量呼叫
  • 用于 ARM CPU(32 位)
    • AAPCS
  • 用于 ARM CPU(64 位)
    • AAPCS64

这里是一个很棒的页面,实际上显示了生成的程序集的差异针对不同的 ABI 进行编译时。

另一件需要提到的事情是,ABI 不仅与程序的可执行模块内部相关。链接器使用它来确保您的程序正确调用库函数。您的计算机上运行着多个共享库,只要您的编译器知道它们各自使用的 ABI,它就可以正确地调用它们的函数,而不会破坏堆栈。

编译器了解如何调用库函数极其非常重要。在托管平台(即操作系统加载程序的平台)上,如果不进行内核调用,您的程序甚至无法闪烁。

You actually don't need an ABI at all if--

  • Your program doesn't have functions, and--
  • Your program is a single executable that is running alone (i.e. an embedded system) where it's literally the only thing running and it doesn't need to talk to anything else.

An oversimplified summary:

API: "Here are all the functions you may call."

ABI: "This is how to call a function."

The ABI is set of rules that compilers and linkers adhere to in order to compile your program so that will work properly. ABIs cover multiple topics:

  • Arguably the biggest and most important part of an ABI is the procedure call standard sometimes known as the "calling convention". Calling conventions standardize how "functions" are translated to assembly code.
  • ABIs also dictate the how the names of exposed functions in libraries should be represented so that other code can call those libraries and know what arguments should be passed. This is called "name mangling".
  • ABIs also dictate what type of data types can be used, how they must be aligned, and other low-level details.

Taking a deeper look at calling convention, which I consider to be the core of an ABI:

The machine itself has no concept of "functions". When you write a function in a high-level language like c, the compiler generates a line of assembly code like _MyFunction1:. This is a label, which will eventually get resolved into an address by the assembler. This label marks the "start" of your "function" in the assembly code. In high-level code, when you "call" that function, what you're really doing is causing the CPU to jump to the address of that label and continue executing there.

In preparation for the jump, the compiler must do a bunch of important stuff. The calling convention is like a checklist that the compiler follows to do all this stuff:

  • First, the compiler inserts a little bit of assembly code to save the current address, so that when your "function" is done, the CPU can jump back to the right place and continue executing.
  • Next, the compiler generates assembly code to pass the arguments.
    • Some calling conventions dictate that arguments should be put on the stack (in a particular order of course).
    • Other conventions dictate that the arguments should be put in particular registers (depending on their data types of course).
    • Still other conventions dictate that a specific combination of stack and registers should be used.
  • Of course, if there was anything important in those registers before, those values are now overwritten and lost forever, so some calling conventions may dictate that the compiler should save some of those registers prior to putting the arguments in them.
  • Now the compiler inserts a jump instruction telling the CPU to go to that label it made previously (_MyFunction1:). At this point, you can consider the CPU to be "in" your "function".
  • At the end of the function, the compiler puts some assembly code that will make the CPU write the return value in the correct place. The calling convention will dictate whether the return value should be put into a particular register (depending on its type), or on the stack.
  • Now it's time for clean-up. The calling convention will dictate where the compiler places the cleanup assembly code.
    • Some conventions say that the caller must clean up the stack. This means that after the "function" is done and the CPU jumps back to where it was before, the very next code to be executed should be some very specific cleanup code.
    • Other conventions say that the some particular parts of the cleanup code should be at the end of the "function" before the jump back.

There are many different ABIs / calling conventions. Some main ones are:

  • For the x86 or x86-64 CPU (32-bit environment):
    • CDECL
    • STDCALL
    • FASTCALL
    • VECTORCALL
    • THISCALL
  • For the x86-64 CPU (64-bit environment):
    • SYSTEMV
    • MSNATIVE
    • VECTORCALL
  • For the ARM CPU (32-bit)
    • AAPCS
  • For the ARM CPU (64-bit)
    • AAPCS64

Here is a great page that actually shows the differences in the assembly generated when compiling for different ABIs.

Another thing to mention is that an ABI isn't only relevant inside your program's executable module. It's also used by the linker to make sure your program calls library functions correctly. You have multiple shared libraries running on your computer, and as long as your compiler knows what ABI they each use, it can call functions from them properly without blowing up the stack.

Your compiler understanding how to call library functions is extremely important. On a hosted platform (that is, one where an OS loads programs), your program can't even blink without making a kernel call.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文