为什么这些简单的方法编译结果不同？

发布于 2025-01-02 15:18:18 字数 6622 浏览 4 评论 0原文

我有点困惑为什么 clang 为以下两种方法发出不同的代码：

@interface ClassA : NSObject
@end

@implementation ClassA
+ (ClassA*)giveMeAnObject1 {
    return [[ClassA alloc] init];
}
+ (id)giveMeAnObject2 {
    return [[ClassA alloc] init];
}
@end

如果我们查看发出的 ARMv7，那么我们会在 O3 处看到启用 ARC 的情况：

        .align  2
        .code   16
        .thumb_func     "+[ClassA giveMeAnObject1]"
"+[ClassA giveMeAnObject1]":
        push    {r7, lr}
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC0_0+4))
        mov     r7, sp
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC0_0+4))
        movw    r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC0_1+4))
        movt    r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC0_1+4))
LPC0_0:
        add     r1, pc
LPC0_1:
        add     r0, pc
        ldr     r1, [r1]
        ldr     r0, [r0]
        blx     _objc_msgSend
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC0_2+4))
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC0_2+4))
LPC0_2:
        add     r1, pc
        ldr     r1, [r1]
        blx     _objc_msgSend
        pop.w   {r7, lr}
        b.w     _objc_autorelease

        .align  2
        .code   16
        .thumb_func     "+[ClassA giveMeAnObject2]"
"+[ClassA giveMeAnObject2]":
        push    {r7, lr}
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC2_0+4))
        mov     r7, sp
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC2_0+4))
        movw    r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC2_1+4))
        movt    r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC2_1+4))
LPC2_0:
        add     r1, pc
LPC2_1:
        add     r0, pc
        ldr     r1, [r1]
        ldr     r0, [r0]
        blx     _objc_msgSend
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC2_2+4))
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC2_2+4))
LPC2_2:
        add     r1, pc
        ldr     r1, [r1]
        blx     _objc_msgSend
        pop.w   {r7, lr}
        b.w     _objc_autoreleaseReturnValue

区别在于对 objc_autoreleaseReturnValue 和 objc_autorelease 的尾部调用。老实说，我希望两者都能调用 objc_autoreleaseReturnValue 。事实上，第一个方法不使用 objc_autoreleaseReturnValue 意味着它可能会比第二个方法慢，因为肯定会有一个自动释放，然后调用者保留，而不是更快地绕过这个冗余调用，如果运行时支持的话 ARC 就可以做到。

发出的 LLVM 给出了为什么会这样的某种原因：

define internal %1* @"\01+[ClassA giveMeAnObject1]"(i8* nocapture %self, i8* nocapture %_cmd) {
  %1 = load %struct._class_t** @"\01L_OBJC_CLASSLIST_REFERENCES_$_", align 4
  %2 = load i8** @"\01L_OBJC_SELECTOR_REFERENCES_", align 4
  %3 = bitcast %struct._class_t* %1 to i8*
  %4 = tail call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %3, i8* %2)
  %5 = load i8** @"\01L_OBJC_SELECTOR_REFERENCES_2", align 4
  %6 = tail call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %4, i8* %5)
  %7 = tail call i8* @objc_autorelease(i8* %6) nounwind
  %8 = bitcast i8* %6 to %1*
  ret %1* %8
}

define internal i8* @"\01+[ClassA giveMeAnObject2]"(i8* nocapture %self, i8* nocapture %_cmd) {
  %1 = load %struct._class_t** @"\01L_OBJC_CLASSLIST_REFERENCES_$_", align 4
  %2 = load i8** @"\01L_OBJC_SELECTOR_REFERENCES_", align 4
  %3 = bitcast %struct._class_t* %1 to i8*
  %4 = tail call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %3, i8* %2)
  %5 = load i8** @"\01L_OBJC_SELECTOR_REFERENCES_2", align 4
  %6 = tail call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %4, i8* %5)
  %7 = tail call i8* @objc_autoreleaseReturnValue(i8* %6) nounwind
  ret i8* %6
}

但我很难理解为什么它决定以不同的方式编译这两种方法。有人能解释一下吗？

更新：

更奇怪的是这些其他方法：

+ (ClassA*)giveMeAnObject3 {
    ClassA *a = [[ClassA alloc] init];
    return a;
}

+ (id)giveMeAnObject4 {
    ClassA *a = [[ClassA alloc] init];
    return a;
}

这些方法编译为：

        .align  2
        .code   16
        .thumb_func     "+[ClassA giveMeAnObject3]"
"+[ClassA giveMeAnObject3]":
        push    {r4, r7, lr}
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC2_0+4))
        add     r7, sp, #4
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC2_0+4))
        movw    r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC2_1+4))
        movt    r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC2_1+4))
LPC2_0:
        add     r1, pc
LPC2_1:
        add     r0, pc
        ldr     r1, [r1]
        ldr     r0, [r0]
        blx     _objc_msgSend
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC2_2+4))
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC2_2+4))
LPC2_2:
        add     r1, pc
        ldr     r1, [r1]
        blx     _objc_msgSend
        blx     _objc_retainAutoreleasedReturnValue
        mov     r4, r0
        mov     r0, r4
        blx     _objc_release
        mov     r0, r4
        pop.w   {r4, r7, lr}
        b.w     _objc_autoreleaseReturnValue

        .align  2
        .code   16
        .thumb_func     "+[ClassA giveMeAnObject4]"
"+[ClassA giveMeAnObject4]":
        push    {r4, r7, lr}
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC3_0+4))
        add     r7, sp, #4
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC3_0+4))
        movw    r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC3_1+4))
        movt    r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC3_1+4))
LPC3_0:
        add     r1, pc
LPC3_1:
        add     r0, pc
        ldr     r1, [r1]
        ldr     r0, [r0]
        blx     _objc_msgSend
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC3_2+4))
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC3_2+4))
LPC3_2:
        add     r1, pc
        ldr     r1, [r1]
        blx     _objc_msgSend
        blx     _objc_retainAutoreleasedReturnValue
        mov     r4, r0
        mov     r0, r4
        blx     _objc_release
        mov     r0, r4
        pop.w   {r4, r7, lr}
        b.w     _objc_autoreleaseReturnValue

这一次，它们是相同的，但是这里有一些东西可以进一步优化：

有一个冗余的 mov r4 , r0 后跟 mov r0, r4。
先是保留，然后是释放。

当然，这两种方法的底部都可以变成：

LPC3_2:
        add     r1, pc
        ldr     r1, [r1]
        blx     _objc_msgSend
        pop.w   {r4, r7, lr}
        b.w     _objc_autoreleaseReturnValue

显然我们也可以省略弹出 r4 因为我们实际上不再破坏它了。然后该方法将变得与 giveMeAnObject2 完全相同，这正是我们所期望的。

clang 为何不聪明，做出这样的事？！

原文

I'm slightly confused as to why clang is emitting different code for the following two method:

@interface ClassA : NSObject
@end

@implementation ClassA
+ (ClassA*)giveMeAnObject1 {
    return [[ClassA alloc] init];
}
+ (id)giveMeAnObject2 {
    return [[ClassA alloc] init];
}
@end

If we look at the ARMv7 emitted then we see this, at O3, with ARC enabled:

        .align  2
        .code   16
        .thumb_func     "+[ClassA giveMeAnObject1]"
"+[ClassA giveMeAnObject1]":
        push    {r7, lr}
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC0_0+4))
        mov     r7, sp
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC0_0+4))
        movw    r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC0_1+4))
        movt    r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC0_1+4))
LPC0_0:
        add     r1, pc
LPC0_1:
        add     r0, pc
        ldr     r1, [r1]
        ldr     r0, [r0]
        blx     _objc_msgSend
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC0_2+4))
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC0_2+4))
LPC0_2:
        add     r1, pc
        ldr     r1, [r1]
        blx     _objc_msgSend
        pop.w   {r7, lr}
        b.w     _objc_autorelease

        .align  2
        .code   16
        .thumb_func     "+[ClassA giveMeAnObject2]"
"+[ClassA giveMeAnObject2]":
        push    {r7, lr}
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC2_0+4))
        mov     r7, sp
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC2_0+4))
        movw    r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC2_1+4))
        movt    r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC2_1+4))
LPC2_0:
        add     r1, pc
LPC2_1:
        add     r0, pc
        ldr     r1, [r1]
        ldr     r0, [r0]
        blx     _objc_msgSend
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC2_2+4))
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC2_2+4))
LPC2_2:
        add     r1, pc
        ldr     r1, [r1]
        blx     _objc_msgSend
        pop.w   {r7, lr}
        b.w     _objc_autoreleaseReturnValue

The only difference is the tail call to objc_autoreleaseReturnValue vs objc_autorelease. I would expect both to call objc_autoreleaseReturnValue to be honest. In-fact the first method not using objc_autoreleaseReturnValue means that it will potentially be slower than the second because there will definitely be an autorelease then a retain by the caller, rather than the faster bypass of this redundant call that ARC can do if it's supported in the runtime.

The LLVM which is emitted gives some kind of reason why it's like that:

define internal %1* @"\01+[ClassA giveMeAnObject1]"(i8* nocapture %self, i8* nocapture %_cmd) {
  %1 = load %struct._class_t** @"\01L_OBJC_CLASSLIST_REFERENCES_$_", align 4
  %2 = load i8** @"\01L_OBJC_SELECTOR_REFERENCES_", align 4
  %3 = bitcast %struct._class_t* %1 to i8*
  %4 = tail call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %3, i8* %2)
  %5 = load i8** @"\01L_OBJC_SELECTOR_REFERENCES_2", align 4
  %6 = tail call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %4, i8* %5)
  %7 = tail call i8* @objc_autorelease(i8* %6) nounwind
  %8 = bitcast i8* %6 to %1*
  ret %1* %8
}

define internal i8* @"\01+[ClassA giveMeAnObject2]"(i8* nocapture %self, i8* nocapture %_cmd) {
  %1 = load %struct._class_t** @"\01L_OBJC_CLASSLIST_REFERENCES_$_", align 4
  %2 = load i8** @"\01L_OBJC_SELECTOR_REFERENCES_", align 4
  %3 = bitcast %struct._class_t* %1 to i8*
  %4 = tail call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %3, i8* %2)
  %5 = load i8** @"\01L_OBJC_SELECTOR_REFERENCES_2", align 4
  %6 = tail call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %4, i8* %5)
  %7 = tail call i8* @objc_autoreleaseReturnValue(i8* %6) nounwind
  ret i8* %6
}

But I'm struggling to see why it's decided to compile these two method differently. Can anyone shed some light onto it?

Update:

Even weirder is these other methods:

+ (ClassA*)giveMeAnObject3 {
    ClassA *a = [[ClassA alloc] init];
    return a;
}

+ (id)giveMeAnObject4 {
    ClassA *a = [[ClassA alloc] init];
    return a;
}

These compile to:

        .align  2
        .code   16
        .thumb_func     "+[ClassA giveMeAnObject3]"
"+[ClassA giveMeAnObject3]":
        push    {r4, r7, lr}
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC2_0+4))
        add     r7, sp, #4
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC2_0+4))
        movw    r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC2_1+4))
        movt    r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC2_1+4))
LPC2_0:
        add     r1, pc
LPC2_1:
        add     r0, pc
        ldr     r1, [r1]
        ldr     r0, [r0]
        blx     _objc_msgSend
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC2_2+4))
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC2_2+4))
LPC2_2:
        add     r1, pc
        ldr     r1, [r1]
        blx     _objc_msgSend
        blx     _objc_retainAutoreleasedReturnValue
        mov     r4, r0
        mov     r0, r4
        blx     _objc_release
        mov     r0, r4
        pop.w   {r4, r7, lr}
        b.w     _objc_autoreleaseReturnValue

        .align  2
        .code   16
        .thumb_func     "+[ClassA giveMeAnObject4]"
"+[ClassA giveMeAnObject4]":
        push    {r4, r7, lr}
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC3_0+4))
        add     r7, sp, #4
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC3_0+4))
        movw    r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC3_1+4))
        movt    r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC3_1+4))
LPC3_0:
        add     r1, pc
LPC3_1:
        add     r0, pc
        ldr     r1, [r1]
        ldr     r0, [r0]
        blx     _objc_msgSend
        movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC3_2+4))
        movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC3_2+4))
LPC3_2:
        add     r1, pc
        ldr     r1, [r1]
        blx     _objc_msgSend
        blx     _objc_retainAutoreleasedReturnValue
        mov     r4, r0
        mov     r0, r4
        blx     _objc_release
        mov     r0, r4
        pop.w   {r4, r7, lr}
        b.w     _objc_autoreleaseReturnValue

This time, they are identical however there's a few things which could be optimised even more here:

There's a redundant mov r4, r0 followed by mov r0, r4.
There's a retain followed by a release.

Surely, the bottom bit of both of those methods can turn into:

LPC3_2:
        add     r1, pc
        ldr     r1, [r1]
        blx     _objc_msgSend
        pop.w   {r4, r7, lr}
        b.w     _objc_autoreleaseReturnValue

Obviously we could then also omit popping r4 because we don't actually clobber it any more. Then the method would turn into the exact same as giveMeAnObject2 which is exactly what we'd expect.

Why is clang not being clever and doing this?!

分享到QQ

分享到微博