当前位置：文江博客话题详情

在 Apple 芯片上捕获浮点异常和信号处理

发布于 2025-01-20 17:37:42 字数 6213 浏览 4 评论 0 原文

为了捕获 MacOS 上的浮点异常，我使用了提供 feenableexcept 功能的扩展。原始扩展（写于 2009 年）位于此处

http ://www-personal.umich.edu/~williams/archive/computation/fe-handling-example.c

注意：如果您遇到此问题帖子以了解如何在 MacOS（使用 Intel 或 Apple 芯片）上捕获浮点异常，您可能想跳过汇编讨论到下面的详细信息。

发布希望更新 Apple 芯片的此扩展，并可能删除一些过时的代码。深入研究fenv.h，可以清楚地了解如何更新Apple芯片的例程feenableexcept、fegetexcept和fedisableexcept 。然而，目前尚不清楚如何处理 2009 扩展中提供的汇编代码，或者为什么要包含此代码。

上面链接中提供的扩展相当长，因此我将只提取涉及程序集的片段：

#if DEFINED_INTEL

// x87 fpu
#define getx87cr(x)    __asm ("fnstcw %0" : "=m" (x));
#define setx87cr(x)    __asm ("fldcw %0"  : "=m" (x));
#define getx87sr(x)    __asm ("fnstsw %0" : "=m" (x));

// SIMD, gcc with Intel Core 2 Duo uses SSE2(4)
#define getmxcsr(x)    __asm ("stmxcsr %0" : "=m" (x));
#define setmxcsr(x)    __asm ("ldmxcsr %0" : "=m" (x));

#endif  // DEFINED_INTEL

此代码在 sigaction 机制的处理程序中使用，该机制用于报告捕获浮点异常。

fhdl ( int sig, siginfo_t *sip, ucontext_t *scp )
{
  int fe_code = sip->si_code;
  unsigned int excepts = fetestexcept (FE_ALL_EXCEPT);

  /* ... see complete code in link above ... */ 
     
    if ( sig == SIGFPE )
    {
#if DEFINED_INTEL
        unsigned short x87cr,x87sr;
        unsigned int mxcsr;

        getx87cr (x87cr);
        getx87sr (x87sr);
        getmxcsr (mxcsr);
        printf ("X87CR:   0x%04X\n", x87cr);
        printf ("X87SR:   0x%04X\n", x87sr);
        printf ("MXCSR:   0x%08X\n", mxcsr);
#endif

        // ....
    }
    printf ("signal:  SIGFPE with code %s\n", fe_code_name[fe_code]);
    printf ("invalid flag:    0x%04X\n", excepts & FE_INVALID);
    printf ("divByZero flag:  0x%04X\n", excepts & FE_DIVBYZERO);
  }
  else printf ("Signal is not SIGFPE, it's %i.\n", sig);

  abort();
}

提供了一个捕获异常并通过 sigaction 处理异常的示例。对 feenableexcept 的调用要么是定义了 feenableexcept 的系统的本机实现（例如非 Apple 硬件），要么是上面链接的扩展中提供的实现。

int main (int argc, char **argv)
{
    double s;
    struct sigaction act;

    act.sa_sigaction = (void(*))fhdl;
    sigemptyset (&act.sa_mask);
    act.sa_flags = SA_SIGINFO;
    

//  printf ("Old divByZero exception: 0x%08X\n", feenableexcept (FE_DIVBYZERO));
    printf ("Old invalid exception:   0x%08X\n", feenableexcept (FE_INVALID));
    printf ("New fp exception:        0x%08X\n", fegetexcept ());

    // set handler
    if (sigaction(SIGFPE, &act, (struct sigaction *)0) != 0)
    {
        perror("Yikes");
        exit(-1);
    }

//  s = 1.0 / 0.0;  // FE_DIVBYZERO
    s = 0.0 / 0.0;  // FE_INVALID
    return 0;
}

当我在基于 Intel 的 Mac 上运行这个程序时，我得到：

Old invalid exception:   0x0000003F
New fp exception:        0x0000003E
X87CR:   0x037F
X87SR:   0x0000
MXCSR:   0x00001F80
signal:  SIGFPE with code FPE_FLTINV
invalid flag:    0x0000
divByZero flag:  0x0000
Abort trap: 6

我的问题是：

为什么汇编代码和对 fetestexcept 的调用都包含在处理程序中？是否都需要报告捕获的异常类型？
处理程序捕获了 FE_INVALID 异常。为什么，那么就是 excepts & FE_INVALID 为零？
sigaction 处理程序在 Apple 芯片上被完全忽略。应该有效吗？或者我是否不理解使用 sigaction 进行信号处理的一些更基本的内容，以及引发 FP 异常时会发生什么？

我正在使用 gcc 和 clang 进行编译。

详细信息：这是从原始代码中提取的一个最小示例，它提炼了我上面的问题。在此示例中，我为 Intel 或 Apple 芯片上的 MacOS 提供了缺少的 feeable except 功能。然后我在使用和不使用 sigaction 的情况下进行测试。

#include <fenv.h>    
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

#if defined(__APPLE__)
#if defined(__arm) || defined(__arm64) || defined(__aarch64__)
#define DEFINED_ARM 1
#define FE_EXCEPT_SHIFT 8
#endif

void feenableexcept(unsigned int excepts)
{
    fenv_t env;
    fegetenv(&env);

#if (DEFINED_ARM==1)
    env.__fpcr = env.__fpcr | (excepts << FE_EXCEPT_SHIFT);
#else
    /* assume Intel */
    env.__control = env.__control & ~excepts;
    env.__mxcsr = env.__mxcsr & ~(excepts << 7);
#endif
    fesetenv(&env);
}
#else
/* Linux may or may not have feenableexcept. */
#endif


static void
fhdl ( int sig, siginfo_t *sip, ucontext_t *scp )
{
    int fe_code = sip->si_code;
    unsigned int excepts = fetestexcept (FE_ALL_EXCEPT);

    if (fe_code == FPE_FLTDIV)
        printf("In signal handler : Division by zero.  Flag is : 0x%04X\n", excepts & FE_DIVBYZERO);

    abort();
}


void main()
{
#ifdef HANDLE_SIGNAL
    struct sigaction act;
    act.sa_sigaction = (void(*))fhdl;
    sigemptyset (&act.sa_mask);
    act.sa_flags = SA_SIGINFO;
    sigaction(SIGFPE, &act, NULL);
#endif    
    
    feenableexcept(FE_DIVBYZERO);

    double x  = 0; 
    double y = 1/x;
}

没有 sigaction 的结果

在 Intel 上：

% gcc -o stack_except stack_except.c
% stack_except
Floating point exception: 8

在 Apple 芯片上：

% gcc -o stack_except stack_except.c
% stack_except
Illegal instruction: 4

以上按预期工作，并且当遇到被零除时代码终止。

sigaction 的结果

Intel 上的结果：

% gcc -o stack_signal stack_signal.c -DHANDLE_SIGNAL
% stack_signal
In signal handler : Division by zero.  Flag is : 0x0000
Abort trap: 6

代码在 Intel 上按预期工作。但是，

fetestexcept（从信号处理程序调用）的返回为零。这是为什么呢？之前是否清除了异常正在由处理程序处理？

Apple 芯片上的结果：

% gcc -o stack_signal stack_signal.c -DHANDLE_SIGNAL
% stack_signal
Illegal instruction: 4

信号处理程序被完全忽略。这是为什么呢？我是否遗漏了有关信号处理方式的一些基本知识？

在原始代码中使用汇编（请参阅帖子顶部的链接）

我的最后一个问题是关于帖子顶部发布的原始示例中汇编的使用。为什么使用汇编来查询信号处理程序中的标志？使用fetestexcept还不够吗？或者检查siginfo.si_code？ 可能的答案：fetestexcept，在处理程序内部使用时不会检测到异常（？）。（这就是为什么从处理程序内部只打印 0x0000 的原因吗？。）

这是包含类似问题的相关帖子。如何在 M1 Mac 上捕获浮点异常？

原文

To trap floating point exceptions on MacOS, I use an extension that provides feenableexcept functionality. The original extension (written in 2009) is here

http://www-personal.umich.edu/~williams/archive/computation/fe-handling-example.c

NOTE: If you came across this post to see how you can trap floating point exceptions on MacOS (either with Intel or Apple silicon), you might want to skip over the assembly discussion to the DETAILS below.

I'd now like to update this extension for Apple silicon and possibly remove some outdated code. Digging through fenv.h, it is clear how to update the routines feenableexcept, fegetexcept and fedisableexcept for Apple silicon. However, it is less clear what to do with the assembly code provided in the 2009 extension, or why this code is even included.

The extension provided in the link above is quite long, so I'll just extract the fragments involving the assembly :

#if DEFINED_INTEL

// x87 fpu
#define getx87cr(x)    __asm ("fnstcw %0" : "=m" (x));
#define setx87cr(x)    __asm ("fldcw %0"  : "=m" (x));
#define getx87sr(x)    __asm ("fnstsw %0" : "=m" (x));

// SIMD, gcc with Intel Core 2 Duo uses SSE2(4)
#define getmxcsr(x)    __asm ("stmxcsr %0" : "=m" (x));
#define setmxcsr(x)    __asm ("ldmxcsr %0" : "=m" (x));

#endif  // DEFINED_INTEL

This code is used in a handler for a sigaction mechanism that is provided to report on the type of floating point exception trapped.

fhdl ( int sig, siginfo_t *sip, ucontext_t *scp )
{
  int fe_code = sip->si_code;
  unsigned int excepts = fetestexcept (FE_ALL_EXCEPT);

  /* ... see complete code in link above ... */ 
     
    if ( sig == SIGFPE )
    {
#if DEFINED_INTEL
        unsigned short x87cr,x87sr;
        unsigned int mxcsr;

        getx87cr (x87cr);
        getx87sr (x87sr);
        getmxcsr (mxcsr);
        printf ("X87CR:   0x%04X\n", x87cr);
        printf ("X87SR:   0x%04X\n", x87sr);
        printf ("MXCSR:   0x%08X\n", mxcsr);
#endif

        // ....
    }
    printf ("signal:  SIGFPE with code %s\n", fe_code_name[fe_code]);
    printf ("invalid flag:    0x%04X\n", excepts & FE_INVALID);
    printf ("divByZero flag:  0x%04X\n", excepts & FE_DIVBYZERO);
  }
  else printf ("Signal is not SIGFPE, it's %i.\n", sig);

  abort();
}

An example is provided that traps exceptions and handles them through sigaction. The call to feenableexcept will either be a native implementation for systems that have feenableexcept defined (e.g. non Apple hardware) or the implementation provided in the extension linked above.

int main (int argc, char **argv)
{
    double s;
    struct sigaction act;

    act.sa_sigaction = (void(*))fhdl;
    sigemptyset (&act.sa_mask);
    act.sa_flags = SA_SIGINFO;
    

//  printf ("Old divByZero exception: 0x%08X\n", feenableexcept (FE_DIVBYZERO));
    printf ("Old invalid exception:   0x%08X\n", feenableexcept (FE_INVALID));
    printf ("New fp exception:        0x%08X\n", fegetexcept ());

    // set handler
    if (sigaction(SIGFPE, &act, (struct sigaction *)0) != 0)
    {
        perror("Yikes");
        exit(-1);
    }

//  s = 1.0 / 0.0;  // FE_DIVBYZERO
    s = 0.0 / 0.0;  // FE_INVALID
    return 0;
}

When I run this on an Intel-based Mac, I get;

Old invalid exception:   0x0000003F
New fp exception:        0x0000003E
X87CR:   0x037F
X87SR:   0x0000
MXCSR:   0x00001F80
signal:  SIGFPE with code FPE_FLTINV
invalid flag:    0x0000
divByZero flag:  0x0000
Abort trap: 6

My questions are:

Why is the assembly code and a call to fetestexcept both included in the handler? Are both necessary to report on the type of exception that was trapped?
An FE_INVALID exception was trapped by the handler. Why, then is excepts & FE_INVALID zero?
The sigaction handler is completely ignored on Apple silicon. Should it work? Or am I not understanding something more fundamental about the signal handing works using sigaction, vs. what happens when a FP exception is raised?

I am compiling with gcc and clang.

DETAILS : Here is a minimal example extracted from the original code that distills my questions above. In this example, I provide the missing feeableexcept functionality for MacOS on Intel or Apple silicon. Then I test with and without sigaction.

#include <fenv.h>    
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

#if defined(__APPLE__)
#if defined(__arm) || defined(__arm64) || defined(__aarch64__)
#define DEFINED_ARM 1
#define FE_EXCEPT_SHIFT 8
#endif

void feenableexcept(unsigned int excepts)
{
    fenv_t env;
    fegetenv(&env);

#if (DEFINED_ARM==1)
    env.__fpcr = env.__fpcr | (excepts << FE_EXCEPT_SHIFT);
#else
    /* assume Intel */
    env.__control = env.__control & ~excepts;
    env.__mxcsr = env.__mxcsr & ~(excepts << 7);
#endif
    fesetenv(&env);
}
#else
/* Linux may or may not have feenableexcept. */
#endif


static void
fhdl ( int sig, siginfo_t *sip, ucontext_t *scp )
{
    int fe_code = sip->si_code;
    unsigned int excepts = fetestexcept (FE_ALL_EXCEPT);

    if (fe_code == FPE_FLTDIV)
        printf("In signal handler : Division by zero.  Flag is : 0x%04X\n", excepts & FE_DIVBYZERO);

    abort();
}


void main()
{
#ifdef HANDLE_SIGNAL
    struct sigaction act;
    act.sa_sigaction = (void(*))fhdl;
    sigemptyset (&act.sa_mask);
    act.sa_flags = SA_SIGINFO;
    sigaction(SIGFPE, &act, NULL);
#endif    
    
    feenableexcept(FE_DIVBYZERO);

    double x  = 0; 
    double y = 1/x;
}

Results without sigaction

On Intel:

% gcc -o stack_except stack_except.c
% stack_except
Floating point exception: 8

And on Apple silicon :

% gcc -o stack_except stack_except.c
% stack_except
Illegal instruction: 4

The above works as expected and code terminates when the division by zero is encountered.

Results with sigaction

Results on Intel:

% gcc -o stack_signal stack_signal.c -DHANDLE_SIGNAL
% stack_signal
In signal handler : Division by zero.  Flag is : 0x0000
Abort trap: 6

The code works as expected on Intel. However,

The return from fetestexcept (called from the signal handler) is zero. Why is this? Was the exception cleared before
being processed by the handler?

Results on Apple silicon :

% gcc -o stack_signal stack_signal.c -DHANDLE_SIGNAL
% stack_signal
Illegal instruction: 4

The signal handler is ignored completely. Why is this? Am I missing something fundamental about how signals are processed?

Use of assembly in original code (see link at top of post)

My final question was concerning the use of assembly in the original example posted at the top of the post. Why was assembly used to query the flags in the signal handler? Is it not enough to use fetestexcept? Or to check siginfo.si_code? Possible answer: fetestexcept, when used inside the handler doesn't detect the exception (?). (Is this why only 0x0000 is printed from inside the handler?.)

Here is related post with a similar questions. How to trap floating-point exceptions on M1 Macs?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ペ泪落弦音 2025-01-27 17:37:42

事实证明，对于未掩盖的FP例外，Aarch64上的MacOS将提供Sigill而非SIGFPE。如何捕获M1 Mac上的浮点异常？显示一个示例，包括如何揭示特定的FP异常，并且是AARCH64上实际目标的重复。（我不知道为什么MacOS会忽略Posix标准并为算术异常发出不同的信号。
其余的答案仅涵盖X86 ASM零件。

我怀疑您还需要学习POSIX信号之类的区别，例如 sigsegv 或 sigfpe ，一个硬件异常，例如页面故障或x86 #de 整数划分异常，与“ FP异常”（在FPU状态寄存器中设置标志的事件，或者如果将未掩盖的事件视为CPU例外，则捕获以运行内核代码

。可以陷阱（将执行发送到内核，而不是继续进入下一个用户空间指令）。操作系统的陷阱处理程序决定发出POSIX信号（例如，在PageFault上解决问题本身，然后返回用户空间以重新运行错误的又称又名错误的指令。）

如果FP异常被掩盖，它们不会导致它们导致CPU异常（陷阱），因此您只能使用 fetestexcept 从同一线程检查它们。 feenableExcept 的点是揭露一些例外。

除非Aarch64还具有X86（X87和SSE）（X87和SSE）的两个单独的FP异常掩码 /状态，否则我看不出您需要内联ASM的任何原因。 fenv.h 函数应起作用。

不幸的是，ISO C没有提供实际 unmask 例外的方法如果任何行动都会提出，因为它们上次被清除）。

但是MacOS Fenv.h 确实有一些常数用于在FP环境中使用 fegetenv / fesetenv 。这是GNU C FeenableExcept 的替代方法。

X86上的ASM / Interins可以很有用，因为它具有两个独立的FP系统，即Legacy X87和Modern SSE / AVX。

fetestexcept 将测试X87或SSE异常，具体取决于默认情况下用于FP数学的编译器。（X86-64的SSE2，除了使用X87的长double外，除了X87 ...）因此，有理由检查两者以确保与 fetestexectect 匹配。

此外，x87状态词具有精度控制位（使其始终与double或float相同，而不是float，而不是全80位），而MXCSR具有DAZ / FTZ（denormals零 / flush至Zero ）禁用逐渐的下流，因为如果发生的话，它会很慢。 FENV 不会正式公开。

X86内联ASM非常幼稚和破碎

如果您确实想要这些X87操作的包装器，则

，请仔细查找其他地方。 #define setx87cr（x）__asm（“ fldcw％0”：“ = m”（x））; 超级损坏。它告诉编译器X是纯输出（由ASM模板编写），但实际上运行了从中读取的ASM指令。我希望除了调试构建以外的任何事物（由于死亡商店的淘汰）会破裂。对于 ldmxcsr 包装器也是如此，这甚至是无用的，因为 #include＆lt; inmintrin.h＆gt; 具有 _mm_mm_setcsr < /code>

它们都需要是 asm volatile ，否则它们被认为是输入的纯函数，因此如果没有输入和一个输出，编译器可以假设它总是写入相同的输出并相应地优化。因此，如果您想多次阅读状态以检查一系列计算后的新例外，则编译器可能会重复使用第一个结果。

（只有输入而不是输出操作数， fldcw 的正确包装器会隐含地挥发。）

另一个并发症是编译器可以选择比您预期的更早或更晚。您可以通过将fp值用作输入的一种方法，例如 asm volatile（“ fnstsw％0”：“ = am”（sw）：“ g”（fpval））。（我还使用“ a” 作为可能的输出之一，因为有一种指令将其写入AX而不是内存的形式。当然，您需要它是 uint16_t 或简短。）

或使用“+g”（fpval）读取+写入+写入“输出”操作数以告诉编译器它读取/写入fpval，所以这有在使用它的计算之前发生。

我不会在这个答案中自己尝试完全更正版本，但这就是要寻找的。

我最初猜测 s = 0.0 / 0.0; < / code>可能不会将AARCH64的Clang汇编为clang64的划分指令。您可能只会获得编译时间代表NAN，并优化未使用的结果，如果您不使用类似的内容，

    volatile double s = 0.0;
    s = 0.0 / s;             // s is now unknown to the compiler

则可以检查编译器的ASM输出以确保有实际的FP Divide指令。

顺便说一句，ARM和AARCH64不要以0（不同于X86不同）捕获整数部门，但是FP例外情况下，希望FP OPS会遇到。但是，如果这仍然不起作用，那么是时候阅读ASM手册并查看编译器ASM输出了。

Turns out MacOS on AArch64 will deliver SIGILL, not SIGFPE, for unmasked FP exceptions. How to trap floating-point exceptions on M1 Macs? shows an example including how to unmask specific FP exceptions, and is a duplicate for the actual goal on AArch64. (Linux on AArch64 apparently delivers SIGFPE; I don't know why MacOS would ignore the POSIX standard and deliver a different signal for arithmetic exceptions).
The rest of this answer just covers the x86 asm parts.

I suspect you also need to learn the difference between a POSIX signal like SIGSEGV or SIGFPE, a hardware exception like a page fault or x86 #DE integer divide exception, vs. an "fp exception" (event that either sets a flag in an FPU status register, or if unmasked is treated as a CPU exception, trapping to run kernel code.)

Having FP exceptions unmasked means an FP math instruction can trap (send execution into the kernel, instead of continuing to the next user-space instruction). The OS's trap handler decides to deliver a POSIX signal (or fix the problem itself on pagefault, for example, and return to user-space to rerun the instruction that faulted aka trapped.)

If FP exceptions are masked, they don't result in CPU exceptions (traps), so you can only check them from the same thread with fetestexcept. The point of feenableexcept is to unmask some exceptions.

Unless AArch64 also has two separate FP exception-masks / statuses like x86 does (x87 and SSE), I don't see any reason you'd need inline asm. fenv.h functions should work.

Unfortunately ISO C doesn't provide a way to actually unmask exceptions, just fetestexcept(FE_DIVBYZERO) etc. to check the status flags in the FP-exception state (which stay set if any operation ever raised them, since they were last cleared). https://en.cppreference.com/w/c/numeric/fenv/fetestexcept

But MacOS fenv.h does have some constants for setting the FP exception-mask bits in the FP environment with fegetenv / fesetenv. This is an alternative to GNU C feenableexcept.

Asm / intrinsics on x86 can be useful because it has two independent FP systems, legacy x87 and modern SSE/AVX.

fetestexcept would test either x87 or SSE exceptions, depending on which the compiler used by default for FP math. (SSE2 for x86-64, except for long double using x87...) So there's reason to want to check both to make sure it matches up with fetestexcept.

Also, the x87 status word has precision-control bits (to make it always round to the same mantissa precision as double or float, instead of to full 80-bit), and MXCSR has DAZ / FTZ (denormals are zero / flush to zero) to disable gradual underflow because it's slow if it happens. fenv doesn't portably expose that.

The x86 inline asm is very naive and broken

If you do actually want wrappers for these x87 operations, look elsewhere for ones written carefully.

#define setx87cr(x) __asm ("fldcw %0" : "=m" (x)); is super broken. It tells the compiler that x is a pure output (written by the asm template), but actually runs an asm instruction that reads from it. I expect that to break (because of dead store elimination) in anything except a debug build. Same for the ldmxcsr wrapper, which is even more useless because #include <immintrin.h> has _mm_setcsr

They all need to be asm volatile, otherwise they're considered a pure function of the inputs, so with no inputs and one output, the compiler can assume that it always writes the same output and optimize accordingly. So if you wanted to read status multiple times to check for new exceptions after each of a series of calculations, the compiler would likely just reuse the first result.

(With only an input instead of an output operand, a correct wrapper for fldcw would be volatile implicitly.)

Another complication is that a compiler could choose to do an FP op earlier or later than you expected. One way you can fix that is by using the FP value as an input, like asm volatile("fnstsw %0" : "=am"(sw) : "g"(fpval) ). (I also used "a" as one of the possible outputs, since there's a form of that instruction which writes to AX instead of memory. Of course you need it to be a uint16_t or short.)

Or use a "+g"(fpval) read+write "output" operand to tell the compiler it reads/writes fpval, so this has to happen before some calculation that uses it.

I'm not going to attempt fully correct versions myself in this answer, but that's what to look for.

I had originally guessed that s = 0.0 / 0.0; might not be compiling to a divide instruction with clang for AArch64. You might just get a compile-time-constant NaN, and optimize away an unused result, if you don't use something like

    volatile double s = 0.0;
    s = 0.0 / s;             // s is now unknown to the compiler

You can check the compiler's asm output to make sure there is an actual FP divide instruction.

BTW, ARM and AArch64 don't trap on integer division by 0 (unlike x86), but with the FP exception unmasked hopefully FP ops do. But if this still doesn't work, then it's time to read the asm manuals and look at compiler asm output.

回复收藏 0 原文