什么是总线错误？ 它与分段错误有什么不同吗？

格子衫的從容 2024-07-13 16:27:41

如今，总线错误在 x86 上很少见，当您的处理器甚至无法尝试请求的内存访问时，就会发生总线错误，通常是：

使用地址不满足其对齐要求的处理器指令。

访问不属于您的进程的内存时会发生分段错误。它们非常常见，并且通常是以下结果：

使用指向已释放的内容的指针。
使用未初始化的虚假指针。
使用空指针。
缓冲区溢出。

PS：更准确地说，操作指针本身不会导致问题。它正在访问它指向的内存（取消引用）。

回复收藏 0 原文

盗心人 2024-07-13 16:27:41

段错误正在访问您无权访问的内存。它是只读的，您没有权限等等...

总线错误正在尝试访问不可能存在的内存。您使用了对系统无意义的地址，或者该操作使用了错误的地址类型。

回复收藏 0 原文

独﹏钓一江月 2024-07-13 16:27:41

mmap 最小 POSIX 7 示例

当内核向进程发送 SIGBUS 时，会发生“总线错误”。

一个生成它的最小示例，因为忘记了 ftruncate：

#include <fcntl.h> /* O_ constants */
#include <unistd.h> /* ftruncate */
#include <sys/mman.h> /* mmap */

int main() {
    int fd;
    int *map;
    int size = sizeof(int);
    char *name = "/a";

    shm_unlink(name);
    fd = shm_open(name, O_RDWR | O_CREAT, (mode_t)0600);
    /* THIS is the cause of the problem. */
    /*ftruncate(fd, size);*/
    map = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    /* This is what generates the SIGBUS. */
    *map = 0;
}

运行方式：

gcc -std=c99 main.c -lrt
./a.out

在 Ubuntu 14.04 中测试。

POSIX 描述 SIGBUS 为：

访问内存对象的未定义部分。

mmap 规范指出：

从 pa 开始并持续 len 字节到对象末尾后的整个页面的地址范围内的引用将导致 SIGBUS 信号的传送。

shm_open 说它生成的对象尺寸 0：

共享内存对象的大小为零。

因此，在 *map = 0 处，我们触及已分配对象的末尾。

ARMv8 aarch64 中未对齐的堆栈内存访问

这在以下位置提到：什么是总线错误？对于 SPARC，但在这里我将提供一个更具重现性的示例。

您所需要的只是一个独立的 aarch64 程序：

.global _start
_start:
asm_main_after_prologue:
    /* misalign the stack out of 16-bit boundary */
    add sp, sp, #-4
    /* access the stack */
    ldr w0, [sp]

    /* exit syscall in case SIGBUS does not happen */
    mov x0, 0
    mov x8, 93
    svc 0

然后该程序在 ThunderX2 服务器计算机.

不幸的是，我无法在 QEMU v4.0.0 用户模式上重现它，我不知道为什么。

该故障似乎是可选的，由 SCTLR_ELx.SA 和 SCTLR_EL1.SA0 字段控制，我总结了相关文档这里更进一步。

mmap minimal POSIX 7 example

"Bus error" happens when the kernel sends SIGBUS to a process.

A minimal example that produces it because ftruncate was forgotten:

#include <fcntl.h> /* O_ constants */
#include <unistd.h> /* ftruncate */
#include <sys/mman.h> /* mmap */

int main() {
    int fd;
    int *map;
    int size = sizeof(int);
    char *name = "/a";

    shm_unlink(name);
    fd = shm_open(name, O_RDWR | O_CREAT, (mode_t)0600);
    /* THIS is the cause of the problem. */
    /*ftruncate(fd, size);*/
    map = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    /* This is what generates the SIGBUS. */
    *map = 0;
}

Run with:

gcc -std=c99 main.c -lrt
./a.out

Tested in Ubuntu 14.04.

POSIX describes SIGBUS as:

Access to an undefined portion of a memory object.

The mmap spec says that:

References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal.

And shm_open says that it generates objects of size 0:

The shared memory object has a size of zero.

So at *map = 0 we are touching past the end of the allocated object.

Unaligned stack memory accesses in ARMv8 aarch64

This was mentioned at: What is a bus error? for SPARC, but here I will provide a more reproducible example.

All you need is a freestanding aarch64 program:

.global _start
_start:
asm_main_after_prologue:
    /* misalign the stack out of 16-bit boundary */
    add sp, sp, #-4
    /* access the stack */
    ldr w0, [sp]

    /* exit syscall in case SIGBUS does not happen */
    mov x0, 0
    mov x8, 93
    svc 0

That program then raises SIGBUS on Ubuntu 18.04 aarch64, Linux kernel 4.15.0 in a ThunderX2 server machine.

Unfortunately, I can't reproduce it on QEMU v4.0.0 user mode, I'm not sure why.

The fault appears to be optional and controlled by the SCTLR_ELx.SA and SCTLR_EL1.SA0 fields, I have summarized the related docs a bit further here.

回复收藏 0 原文

葬花如无物 2024-07-13 16:27:41

我同意上面所有的答案。以下是我对总线错误的两点看法：

总线错误不一定是由程序代码中的指令引起的。当您运行二进制文件并且在执行期间二进制文件被修改（被构建覆盖或删除等）时，可能会发生这种情况。

验证是否是这种情况

检查这是否是原因的一个简单方法是从构建输出目录中启动相同二进制文件的几个实例，并在启动后运行构建。在构建完成并替换二进制文件（两个实例当前正在运行的二进制文件）后不久，两个正在运行的实例都会崩溃并出现 SIGBUS 错误。

根本原因

这是因为操作系统交换内存页面，并且在某些情况下，二进制文件可能未完全加载到内存中。当操作系统尝试从同一二进制文件获取下一页，但该二进制文件自上次读取以来已发生更改时，就会发生这些崩溃。

回复收藏 0 原文

橪书 2024-07-13 16:27:41

我相信内核会引发 SIGBUS
当应用程序显示数据时
数据总线上未对准。我认为
因为大多数[?]现代编译器
对于大多数处理器来说，垫/对齐
程序员的数据，
过去的对齐问题（至少）
减轻，因此人们看不到
这些天 SIGBUS 太频繁了（据我所知）。

来自：此处

回复收藏 0 原文

浮世清欢 2024-07-13 16:27:41

在 POSIX 系统上，您还可以获得 SIGBUS 当代码页由于某种原因无法调入时发出信号。

回复收藏 0 原文

◇流星雨 2024-07-13 16:27:41

总线错误的一个典型实例是在某些体系结构上，例如 SPARC（至少有一些SPARC，也许这已被更改），是当您进行未对齐的访问时。例如：

unsigned char data[6];
(unsigned int *) (data + 2) = 0xdeadf00d;

此代码段尝试将 32 位整数值 0xdeadf00d 写入（很可能）未正确对齐的地址，并且将在此“挑剔”的体系结构上生成总线错误看待。顺便说一句，Intel x86不是这样的架构。它将允许访问（尽管执行速度更慢）。

One classic instance of a bus error is on certain architectures, such as the SPARC (at least some SPARCs, maybe this has been changed), is when you do a misaligned access. For instance:

unsigned char data[6];
(unsigned int *) (data + 2) = 0xdeadf00d;

This snippet tries to write the 32-bit integer value 0xdeadf00d to an address that is (most likely) not properly aligned, and will generate a bus error on architectures that are "picky" in this regard. The Intel x86 is, by the way, not such an architecture. It would allow the access (albeit execute it more slowly).

回复收藏 0 原文

少女的英雄梦 2024-07-13 16:27:41

我刚刚在 OS X 上编程 C 时遇到的总线错误的具体示例：

#include <string.h>
#include <stdio.h>

int main(void)
{
    char buffer[120];
    fgets(buffer, sizeof buffer, stdin);
    strcat("foo", buffer);
    return 0;
}

如果您不记得文档 strcat 通过更改第一个参数（翻转参数）将第二个参数附加到第一个参数并且效果很好）。在 Linux 上，这会产生分段错误（如预期），但在 OS X 上，会产生总线错误。为什么？我真的不知道。

A specific example of a bus error I just encountered while programming C on OS X:

#include <string.h>
#include <stdio.h>

int main(void)
{
    char buffer[120];
    fgets(buffer, sizeof buffer, stdin);
    strcat("foo", buffer);
    return 0;
}

In case you don't remember the docs strcat appends the second argument to the first by changing the first argument(flip the arguments and it works fine). On linux this gives a segmentation fault(as expected), but on OS X it gives a bus error. Why? I really don't know.

回复收藏 0 原文

寒尘 2024-07-13 16:27:41

首先，SIGBUS 和 SIGSEGV 不是特定类型的错误，而是错误组或错误族。这就是为什么您通常会看到信号编号 (si_no) 和信号代码 (si_code)。

它们还取决于操作系统和架构来确定到底是什么导致了它们。

一般来说我们可以这么说。
SIGSEGV 与内存映射（权限、无映射）相关，即 mmu 错误。

SIGBUS 是指内存映射成功并且底层内存系统出现问题（内存不足、该位置没有内存、对齐、smmu 阻止访问等），即总线错误。

SIGBUS 还可以对于映射文件，如果文件从系统中消失，例如您将文件映射到可移动介质上并且它被拔掉。

在平台上查看的一个好地方是 siginfo.h 标头，以了解信号子类型。
例如，对于 linux 此页面提供了概述。
https://elixir.bootlin。 com/linux/latest/source/include/uapi/asm-generic/siginfo.h#L245

/*
 * SIGSEGV si_codes
 */
#define SEGV_MAPERR 1   /* address not mapped to object */
#define SEGV_ACCERR 2   /* invalid permissions for mapped object */
#define SEGV_BNDERR 3   /* failed address bound checks */
#ifdef __ia64__
# define __SEGV_PSTKOVF 4   /* paragraph stack overflow */
#else
# define SEGV_PKUERR    4   /* failed protection key checks */
#endif
#define SEGV_ACCADI 5   /* ADI not enabled for mapped object */
#define SEGV_ADIDERR    6   /* Disrupting MCD error */
#define SEGV_ADIPERR    7   /* Precise MCD exception */
#define SEGV_MTEAERR    8   /* Asynchronous ARM MTE error */
#define SEGV_MTESERR    9   /* Synchronous ARM MTE exception */
#define NSIGSEGV    9

/*
 * SIGBUS si_codes
 */
#define BUS_ADRALN  1   /* invalid address alignment */
#define BUS_ADRERR  2   /* non-existent physical address */
#define BUS_OBJERR  3   /* object specific hardware error */
/* hardware memory error consumed on a machine check: action required */
#define BUS_MCEERR_AR   4
/* hardware memory error detected in process but not consumed: action optional*/
#define BUS_MCEERR_AO   5
#define NSIGBUS     5

最后要注意的是，所有信号也可以由用户生成，例如kill。
如果是用户生成的，则 si_code 为 SI_USER。所以特殊来源会得到负的 si_codes。

/*
 * si_code values
 * Digital reserves positive values for kernel-generated signals.
 */
#define SI_USER     0       /* sent by kill, sigsend, raise */
#define SI_KERNEL   0x80        /* sent by the kernel from somewhere */
#define SI_QUEUE    -1      /* sent by sigqueue */
#define SI_TIMER    -2      /* sent by timer expiration */
#define SI_MESGQ    -3      /* sent by real time mesq state change */
#define SI_ASYNCIO  -4      /* sent by AIO completion */
#define SI_SIGIO    -5      /* sent by queued SIGIO */
#define SI_TKILL    -6      /* sent by tkill system call */
#define SI_DETHREAD -7      /* sent by execve() killing subsidiary threads */
#define SI_ASYNCNL  -60     /* sent by glibc async name lookup completion */

#define SI_FROMUSER(siptr)  ((siptr)->si_code <= 0)
#define SI_FROMKERNEL(siptr)    ((siptr)->si_code > 0)

Firstly SIGBUS and SIGSEGV are not a specific type of error but are groups or families of errors. This is why you typically see a signal number(si_no) and a signal code(si_code).

They also depend on the os and architecture as to what can cause them exactly.

Generally we can say that.
A SIGSEGV is related to memory mappings(permissions,no mapping) i.e. an mmu error.

A SIGBUS is when the memory mapping succeeds and you hit an issue with the underlying memory system(out of memory, No memory at that location, alignment, smmu prevents access, etc..), i.e. a bus error..

A SIGBUS can also be with mmapped files, if the file vanishes from the system e.g. you mmap a file on a removable media and it gets unplugged.

A good place to look on a platform is the siginfo.h header, to get an idea of the signal sub types.
e.g. for linux This page provides an overview.
https://elixir.bootlin.com/linux/latest/source/include/uapi/asm-generic/siginfo.h#L245

/*
 * SIGSEGV si_codes
 */
#define SEGV_MAPERR 1   /* address not mapped to object */
#define SEGV_ACCERR 2   /* invalid permissions for mapped object */
#define SEGV_BNDERR 3   /* failed address bound checks */
#ifdef __ia64__
# define __SEGV_PSTKOVF 4   /* paragraph stack overflow */
#else
# define SEGV_PKUERR    4   /* failed protection key checks */
#endif
#define SEGV_ACCADI 5   /* ADI not enabled for mapped object */
#define SEGV_ADIDERR    6   /* Disrupting MCD error */
#define SEGV_ADIPERR    7   /* Precise MCD exception */
#define SEGV_MTEAERR    8   /* Asynchronous ARM MTE error */
#define SEGV_MTESERR    9   /* Synchronous ARM MTE exception */
#define NSIGSEGV    9

/*
 * SIGBUS si_codes
 */
#define BUS_ADRALN  1   /* invalid address alignment */
#define BUS_ADRERR  2   /* non-existent physical address */
#define BUS_OBJERR  3   /* object specific hardware error */
/* hardware memory error consumed on a machine check: action required */
#define BUS_MCEERR_AR   4
/* hardware memory error detected in process but not consumed: action optional*/
#define BUS_MCEERR_AO   5
#define NSIGBUS     5

a Final note is that, all signals can also be user generated e.g. kill.
If it is user generated then the si_code is SI_USER. So special sources get negative si_codes.

/*
 * si_code values
 * Digital reserves positive values for kernel-generated signals.
 */
#define SI_USER     0       /* sent by kill, sigsend, raise */
#define SI_KERNEL   0x80        /* sent by the kernel from somewhere */
#define SI_QUEUE    -1      /* sent by sigqueue */
#define SI_TIMER    -2      /* sent by timer expiration */
#define SI_MESGQ    -3      /* sent by real time mesq state change */
#define SI_ASYNCIO  -4      /* sent by AIO completion */
#define SI_SIGIO    -5      /* sent by queued SIGIO */
#define SI_TKILL    -6      /* sent by tkill system call */
#define SI_DETHREAD -7      /* sent by execve() killing subsidiary threads */
#define SI_ASYNCNL  -60     /* sent by glibc async name lookup completion */

#define SI_FROMUSER(siptr)  ((siptr)->si_code <= 0)
#define SI_FROMKERNEL(siptr)    ((siptr)->si_code > 0)

回复收藏 0 原文

|煩躁 2024-07-13 16:27:41

当根目录为 100% 时，我收到总线错误。

回复收藏 0 原文

挽清梦 2024-07-13 16:27:41

它通常意味着未对齐的访问。

尝试访问物理上不存在的内存也会产生总线错误，但如果您使用带有 MMU 的处理器和没有 bug 的操作系统，则不会看到此错误，因为您不会有任何非- 映射到进程地址空间的现有内存。

回复收藏 0 原文

温柔戏命师 2024-07-13 16:27:41

这取决于您的操作系统、CPU、编译器以及可能的其他因素。

一般来说，这意味着 CPU 总线无法完成命令，或者遇到冲突，但这可能意味着一系列的事情，具体取决于正在运行的环境和代码。

回复收藏 0 原文

固执像三岁 2024-07-13 16:27:41

我在 Mac OS X 上出现总线错误的原因是我尝试在堆栈上分配大约 1Mb 的空间。这在一个线程中运行良好，但是当使用 openMP 时，这会导致总线错误，因为 Mac OS X 的功能非常有限非主线程的堆栈大小。

回复收藏 0 原文

乖乖哒 2024-07-13 16:27:41

对我来说，我没有声明我的程序集返回到 .text 部分，从而意外触发了“总线错误”。这看起来似乎很明显，但它让我困惑了一段时间。

例如。

.globl _myGlobal # Allocate a 64-bit global with the value 2
.data
.align 3
_myGlobal:
.quad 2
.globl _main # Main function code
_main:
push %rbp

从数据返回代码时缺少文本指令：

_myGlobal:
.quad 2
.text # <- This
.globl _main
_main:

希望这对某人有帮助

For me, I accidentally triggered a "Bus Error" by not declaring that my assembly was heading back into the .text section. It might seem obvious but it had me stumped for a while.

Eg.

.globl _myGlobal # Allocate a 64-bit global with the value 2
.data
.align 3
_myGlobal:
.quad 2
.globl _main # Main function code
_main:
push %rbp

Was missing a text directive when returning to code from data:

_myGlobal:
.quad 2
.text # <- This
.globl _main
_main:

Hope this ends up helpful to someone

回复收藏 0 原文

∞梦里开花 2024-07-13 16:27:41

一个值得注意的原因是，如果您尝试 mmap 不允许用户空间访问的 /dev/mem 区域，则会返回 SIGBUS。

回复收藏 0 原文

鲜肉鲜肉永远不皱 2024-07-13 16:27:41

我试图释放意外在堆栈上的字符串：

#include <stdlib.h>

int main(void)
{
    char *str = "foo";
    free(str);
    return (EXIT_SUCCESS);
}

我的修复方法是 strdup() 堆栈上的字符串：

#include <stdlib.h>
#include <string.h>

int main(void)
{
    char *str = strdup("foo");
    free(str);
    return (EXIT_SUCCESS);
}

I was trying to free a string that was accidentally on the stack:

#include <stdlib.h>

int main(void)
{
    char *str = "foo";
    free(str);
    return (EXIT_SUCCESS);
}

My fix was to strdup() the string on the stack:

#include <stdlib.h>
#include <string.h>

int main(void)
{
    char *str = strdup("foo");
    free(str);
    return (EXIT_SUCCESS);
}

回复收藏 0 原文

一身软味 2024-07-13 16:27:41

导致总线错误的典型缓冲区溢出是，

{
    char buf[255];
    sprintf(buf,"%s:%s\n", ifname, message);
}

如果双引号 ("") 中的字符串大小大于 buf 大小，则会出现总线错误。

A typical buffer overflow which results in Bus error is,

{
    char buf[255];
    sprintf(buf,"%s:%s\n", ifname, message);
}

Here if size of the string in double quotes ("") is more than buf size it gives bus error.

回复收藏 0 原文

什么是总线错误？它与分段错误有什么不同吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（17）

关于作者

相关话题

热门标签

推荐作者

苦中寻乐

lueluelue

嗼ふ静

王权女流氓

与花如笺

残酷

友情链接

什么是总线错误？ 它与分段错误有什么不同吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（17）

关于作者

相关话题

热门标签

推荐作者

苦中寻乐

lueluelue

嗼ふ静

王权女流氓

与花如笺

残酷

友情链接

什么是总线错误？它与分段错误有什么不同吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。