内存泄漏追踪的改进

发布于 2024-09-10 10:49:04 字数 7700 浏览 7 评论 0原文

我花了整整一个星期的时间来追踪和解决内存泄漏问题，而在那周的另一端我却有点茫然。 必须有更好的方法来做到这一点，就是我所能想到的，所以我认为是时候问这个相当沉重的话题了。

这篇文章相当庞大。对此表示歉意，但我认为在这种情况下，尽可能彻底地解释细节是有必要的。之所以如此，是因为它让你全面了解我为找到这个混蛋所做的所有事情，这涉及到很多事情。仅这个错误就花了我大约三天 10 多个小时的时间来追踪...

当我寻找泄漏时

当我寻找泄漏时，我倾向于分阶段进行，如果满足以下条件，我会“更深入”地升级问题：它在早期阶段是无法解决的。这些阶段从 Leaks 告诉我存在问题开始。

在这种特殊情况下（这是一个例子；错误已解决；我不是在寻求解决此错误的答案，而是在寻求改进我发现错误的过程的方法）），我在一个相当大的多线程应用程序中发现了一个泄漏（甚至两个），特别是包括我在其中使用的 3 个左右的外部库（解压缩功能和 http 服务器）。那么让我们看看我修复这个泄漏的过程。

第 1 阶段：Leaks 告诉我有泄漏

_{（来源：enrogue.com）}

嗯，这很有趣。由于我的应用程序是多线程的，我的第一个想法是我忘记在某个地方放置一个 NSAutoreleasePool ，但在检查了所有正确的位置后，情况并非如此。我查看了堆栈跟踪。

阶段 2：堆栈跟踪

_{（来源：enrogue.com）}

两个 GeneralBlock-160 泄漏具有相同的堆栈跟踪（这很奇怪，因为我将其按“相同的回溯”分组，但无论如何），从 thread_assign_default 开始，到 malloc< _NSAPDataCreate 下的 /code>。在这两者之间，绝对没有任何东西与我的应用程序相关。这些电话中没有一个是“我的”。所以我做了一些谷歌搜索来弄清楚这些可能有什么用处。

首先，我们有许多显然与线程回调有关的方法，例如 POSIX 线程调用进入 NSThread 调用。

在这个（反向）堆栈跟踪中的 #8-6 处，我们有 +[NSThread exit] ，后面是 pthread_exit 和 _pthread_exit ，这很有趣，但根据我的经验，我无法真正判断它是否表明了某些特定情况，或者是否只是“事情如何发展”。

之后，我们有一个名为 _pthread_tsd_cleanup 的线程清理方法 - 无论“tsd”代表什么，我不确定，但无论如何，我都会继续。

在#4-#3，我们有：

CA::Transaction::release_thread(void*)
CAPushAutoreleasePool

有趣。我们这里有核心动画。我通过非常艰难的方式了解到这一点，这意味着我可能正在从后台线程进行 UIKit 调用，但我不能这样做。最大的问题是在哪里以及如何进行。虽然说“你不应该从旧的后台线程调用 UIKit”可能很容易，但要知道 UIKit 调用到底是什么构成并不那么容易。正如您将在本例中看到的，这远非显而易见。

那么#2-1 的级别太低，没有任何实际用途。我认为。

我仍然不知道从哪里开始寻找内存泄漏。所以我做了我唯一能想到的事情。

阶段 3：返回 galore

建议我们有一个看起来像这样的调用树：

App start
    |
Some init
  |      \
A init   B init - Other case - Fourth case
   \     /              \
 Some case            Third case
     |
  Fifth case
   ...

应用程序生命周期的粗略轮廓。简而言之，我们有许多应用程序可以根据发生的情况采取的路径，并且每个路径都包含在不同位置调用的一堆代码。于是我拿出剪刀开始剪。我一开始就接近“应用程序启动”，然后慢慢地沿着线向十字路口移动，在那里我只允许一条路。

然后

// ...
[fooClass doSomethingAwesome:withThisCoolThing];
// ...

如果

// ...
return;
[fooClass doSomethingAwesome:withThisCoolThing];
// ...

在设备上安装该应用程序，将其关闭，使用 alt-tab 切换到 Instruments，按下 cmd-R，像猴子一样锤击该应用程序，寻找泄漏，并且在可能 10 个“周期”之后，没有什么，我断定泄漏是在代码的更下方。可能在 fooClass 的 doSomethingAwesome: 中或在对 fooClass 的调用下面。

因此，我将 return 移至对 fooClass 的调用下方一步并再次测试。如果现在没有出现泄漏，那就太好了，fooClass 是无辜的。

此方法存在一些问题。

内存泄漏对于何时暴露自己往往有点势利。可以这么说，你需要浪漫的音乐和蜡烛，并且在一处切断一端有时会导致内存泄漏决定根本不出现。我经常不得不返回返回，因为泄漏是在我添加以下行之后出现的：UIImage *a;（显然它本身不会泄漏）
这真是令人痛苦对于一个大程序来说，速度又慢又累。尤其是当您最终不得不再次备份时。
很难追踪。我不断地输入 // 17 14.48.25: 3 Leaks @ RSx10 ，这在英语中的意思是“July 17th, 14:48.25: 3 Leaks大众，当我重复选择该项目10次时”。应用程序。凌乱，但至少它让我清楚地看到我在哪里测试了东西以及结果是什么。

这个方法最终把我带到了处理缩略图的类的最底层。该类有两个方法，一个方法初始化事物，然后对一个单独的方法进行 [NSThread detachThreadWithSeparator:] 调用，该方法处理实际图像并将它们向右缩放后放入各个视图中尺寸。

有点像这样：

// no leaks if I return here
[NSThread detachNewThreadSelector:@selector(loadThumbnails) toTarget:self withObject:nil];
// leaks appear if I return here

但是如果我进入 -loadThumbnails 并逐步通过它，泄漏就会消失并以非常随机的方式出现。在一次广泛的运行中，我会出现泄漏，如果我将 return 语句移到下面，例如 UIImage *small, *bloated; 我就会出现泄漏。简而言之，非常不稳定。

经过更多测试后，我意识到如果我在应用程序中更快地重新加载内容，泄漏往往会更频繁地出现。经过几个小时的痛苦，我意识到，如果这个外部线程在加载另一个会话之前没有完成执行（从而创建第二个缩略图类并丢弃这个类），就会出现泄漏。

这是一个很好的线索。因此，我添加了一个名为 worldExists 的 BOOL，一旦新会话启动，它就设置为 NO，然后开始喷洒 - loadThumbnails 的 for 循环，

if (worldExists) [action]
if (worldExists) [action 2]
// ...

并确保在我发现 !worldExists 后立即退出循环。但泄漏仍然存在。

return 方法在非常不稳定的地方显示出泄漏。不经意间，就出现了。

因此，我尝试将其添加到 -loadThumbnails 的最顶部：

for (int i = 0; i < 50 && worldExists; i++) {
    [NSThread sleepForTimeInterval:0.1f];
}
return;

不管你信不信，但如果我在 5 秒内加载新会话，泄漏实际上就会出现。

最后，我在缩略图类的 -dealloc 中放置了一个断点。堆栈跟踪看起来像这样：

#0  -[Thumbs dealloc] (self=0x162ec0, _cmd=0x32299664) at /Users/me/Documents/myapp/Classes/Thumbs.m:28
#1  0x32c0571a in -[NSObject release] ()
#2  0x32b824d0 in __NSFinalizeThreadData ()
#3  0x30c3e598 in _pthread_tsd_cleanup ()
#4  0x30c3e2b2 in _pthread_exit ()
#5  0x30c3e216 in pthread_exit ()
#6  0x32b15ffe in +[NSThread exit] ()
#7  0x32b81d16 in __NSThread__main__ ()
#8  0x30c8f78c in _pthread_start ()
#9  0x30c85078 in thread_start ()

嗯...看起来还不错。如果我等到 -loadThumbnails 方法完成，跟踪看起来会有所不同：

#0  -[Thumbs dealloc] (self=0x194880, _cmd=0x32299664) at /Users/me/Documents/myapp/Classes/Thumbs.m:26
#1  0x32c0571a in -[NSObject release] ()
#2  0x00009556 in -[WorldLoader dealloc] (self=0x192ba0, _cmd=0x32299664) at /Users/me/Documents/myapp/Classes/WorldLoader.m:33
#3  0x32c0571a in -[NSObject release] ()
#4  0x000045b2 in -[WorldViewController setupWorldWithPath:] (self=0x11e9d0, _cmd=0x3fee0, path=0x4cb84) at /Users/me/Documents/myapp/Classes/WorldViewController.m:98
#5  0x32c29ffa in -[NSObject performSelector:withObject:] ()
#6  0x32b81ece in __NSThreadPerformPerform ()
#7  0x32c23c14 in CFRunLoopRunSpecific ()
#8  0x32c234e0 in CFRunLoopRunInMode ()
#9  0x30d620da in GSEventRunModal ()
#10 0x30d62186 in GSEventRun ()
#11 0x314d54c8 in -[UIApplication _run] ()
#12 0x314d39f2 in UIApplicationMain ()
#13 0x00002fd2 in main (argc=1, argv=0x2ffff5dc) at /Users/me/Documents/myapp/main.m:14

事实上，完全不同。不管你信不信，此时我仍然一无所知，但我终于明白发生了什么事。

问题如下：当我在缩略图加载器中执行 [NSThread detachNewThreadSelector:] 时，NSThread 保留该对象，直到线程耗尽。如果在加载另一个会话之前缩略图加载未完成，缩略图加载器上的所有保留都会被释放，但由于线程仍在运行，NSThread 使其保持活动状态。

一旦线程从 -loadThumbnails 返回，NSThread 就会释放它，它会达到 0 保留并直接进入 -dealloc...仍在后台线程中。

然后，当我调用 [super dealloc] 时，UIView 会乖乖地尝试从其超级视图中删除自身，这是后台线程上的 UIKit 调用。结果发生泄漏。

我想出的解决这个问题的解决方案是将加载器包装在另外两种方法中。我将其重命名为 -_loadThumbnails ，然后执行了以下操作：

[self retain]; // <-- added this before the detaching
[NSThread detachNewThreadSelector:@selector(loadThumbnails) toTarget:self withObject:nil];

// added these two new methods
- (void)doneLoadingThumbnails
{
    [self release];
}
-(void)loadThumbnails
{
    [self _loadThumbnails];
    [self performSelectorOnMainThread:@selector(doneLoadingThumbnails) withObject:nil waitUntilDone:NO];
}

说了这么多（我说了很多 - 对此感到抱歉），最大的问题是： 你如何计算这些奇怪的 -不经过上述所有步骤就可以解决问题？

在上述过程中我错过了什么推理？您在什么时候意识到问题出在哪里？我的方法中有哪些冗余步骤？我可以以某种方式跳过第 3 阶段（返回），或者减少它，或者提高它的效率吗？

我知道这个问题是模糊和巨大的，但这整个概念是模糊和巨大的。我不是要求你教我如何发现漏洞（我可以做到……这只是非常非常痛苦），我是问人们倾向于做什么来减少处理时间。问人们“你如何发现泄漏？”这是不可能的，因为有很多不同的种类。但我容易遇到问题的一种类型是与上面类似的类型，在实际应用程序中没有调用。

您使用什么流程来更有效地追踪它？

原文

I just spent a whole week tracking down and whacking memory leaks over the head, and I arrived on the other end of that week a little dazed. There has to be a better way to do this, is all I can think, and so I figured it was time to ask about this rather heavy subject.

This post turned out to be rather huge. Apologies for that, though I think in this case, explaining the details as thoroughly as possible is warranted. Explicitly so, because it gives you the whole picture of all the things I did to find this bugger, which was a lot. This bug alone took me roughly three 10+ hour days to track down...

When I hunt leaks

When I hunt leaks I tend to do it in phases, where I escalate "deeper" into the problem if it's not solvable in an earlier phase. These phases begin with Leaks telling me there's an issue.

In this particular case (which is an example; the bug is solved; I'm not asking for answers to solving this bug, I'm asking for ways to improve the process in which I find the bug), I am finding a leak (two, even) in a multithreaded application which is fairly large, especially including the 3 or so external libraries I'm using in it (unzip feature and http server). So let's see the process where I fix this leak.

Phase 1: Leaks tells me there's a leak

_{(source: enrogue.com)}

Well, that's interesting. Since my app is multithreaded, my first thought is that I forgot to put an NSAutoreleasePool in somewhere, but after checking in all the right places, this is not the case. I take a look at the stack trace.

Phase 2: The stack trace

_{(source: enrogue.com)}

Both of the GeneralBlock-160 leaks have identical stack traces (which is odd since I have it grouped by "identical backtraces", but anyway), which start at thread_assign_default and end at malloc under _NSAPDataCreate. In between, there is absolutely nothing that correlates to my app. Not a single of those calls are "mine". So I do some Googling around to figure out what these might be used for.

First we have a number of methods which obviously have to do with a thread callback, such as POSIX thread calls going into NSThread calls.

At #8-6 in this (inverted) stack trace, we have +[NSThread exit] followed by pthread_exit and _pthread_exit which is interesting, but in my experience I can't really tell if it's indicative of some specific case or if it's simply "how things go".

After that we have a thread cleanup method called _pthread_tsd_cleanup -- whatever "tsd" stands for I'm not sure, but regardless, I move on.

At #4-#3 we have:

CA::Transaction::release_thread(void*)
CAPushAutoreleasePool

Interesting. We have Core Animation here. That, I've learned the very hard way, means that I'm probably doing UIKit calls from a background thread, which I must not. The big question is where, and how. While it may be easy to say "thou shalt not call UIKit from ye olde background thread", it's not as easy to know what exactly constitutes as a UIKit call. As you'll see in this case, it's far from obvious.

Then #2-1 turn out to be way too low level to be of any real use. I think.

I still have no clue where to even begin looking for this memory leak. So I do the only thing I can think of.

Phase 3: return galore

Propose we have a call tree that looks something like this:

App start
    |
Some init
  |      \
A init   B init - Other case - Fourth case
   \     /              \
 Some case            Third case
     |
  Fifth case
   ...

Rough outline of an app's lifecycle, that. In short, we have a number of paths the app can take depending on whatever happens, and each of these paths comprise of a bunch of code being called in various places. So I pull out the scissors and start chopping. I start close towards "App start" initially, and slowly move down the line towards crossroads, where I only allow one path.

So I have

// ...
[fooClass doSomethingAwesome:withThisCoolThing];
// ...

And I do

// ...
return;
[fooClass doSomethingAwesome:withThisCoolThing];
// ...

And then install the app on the device, close it down, alt-tab to Instruments, hit cmd-R, hammer on the app like a monkey, look for leaks, and after maybe 10 "cycles" if there's nothing, I conclude that the leak is further down the code. Possibly in fooClass's doSomethingAwesome: or below the call to fooClass.

So I move that return one step below the call to fooClass and test again. If the leak doesn't appear now, great, fooClass is innocent.

There are a few issues with this method.

Memory leaks tend to be a bit snobbish about when to reveal themselves. You need romantic music and candles, so to say, and cutting one end of in one place sometimes results in the memory leak deciding not to appear at all. I often had to go back because the leak had appeared after I added, say, this line: UIImage *a; (which obviously isn't leaking by itself)
It's excruciatingly slow and tiring to do for a big program. Especially if you end up having to back up again.
It's hard to keep track of. I kept putting in // 17 14.48.25: 3 leaks @ RSx10 which in English meant "July 17th, 14:48.25: 3 leaks occured when I repeatedly selected the item 10 times" sprinkled throughout the entire app. Messy, but at least it let me see clearly where I'd tested things and what the results were.

This method eventually took me down to the very bottom of a class which handled thumbnails. The class had two methods, one which initialized things and then did a [NSThread detachThreadWithSeparator:] call to a separate method which processed the actual images and put them into the individual views after scaling them down to the right size.

It was sort of like this:

// no leaks if I return here
[NSThread detachNewThreadSelector:@selector(loadThumbnails) toTarget:self withObject:nil];
// leaks appear if I return here

But if I went into -loadThumbnails and stepped down through it, the leaks would disappear and appear in a very random fashion. At one extensive run, I would have leaks and if I moved the return statement down below e.g. UIImage *small, *bloated; I would have leaks appearing. In short, it was very erratic.

After some more testing, I realized that leaks would tend to appear more often if I reloaded things quicker while in the app. After many hours of pain, I realized that if this external thread did not finish executing before I loaded another session (thus creating a second thumbnail class and discarding this one), the leak would appear.

That's a nice clue. So I added a BOOL called worldExists which was set to NO as soon as a new session was initiated, and then started sprinkling -loadThumbnails's for loop with

if (worldExists) [action]
if (worldExists) [action 2]
// ...

and also made sure to exit the loop as soon as I found out that !worldExists. But the leak remained.

And the return method was showing leaks in very erratic places. Randomly, it appeared.

So I tried adding this at the very top of -loadThumbnails:

for (int i = 0; i < 50 && worldExists; i++) {
    [NSThread sleepForTimeInterval:0.1f];
}
return;

And believe it or not, but the leaks actually appeared if I loaded a new session within 5 seconds.

Finally, I put a breakpoint in -dealloc for the thumbnail class. The stack trace for this looked like this:

#0  -[Thumbs dealloc] (self=0x162ec0, _cmd=0x32299664) at /Users/me/Documents/myapp/Classes/Thumbs.m:28
#1  0x32c0571a in -[NSObject release] ()
#2  0x32b824d0 in __NSFinalizeThreadData ()
#3  0x30c3e598 in _pthread_tsd_cleanup ()
#4  0x30c3e2b2 in _pthread_exit ()
#5  0x30c3e216 in pthread_exit ()
#6  0x32b15ffe in +[NSThread exit] ()
#7  0x32b81d16 in __NSThread__main__ ()
#8  0x30c8f78c in _pthread_start ()
#9  0x30c85078 in thread_start ()

Well... that doesn't look too bad. If I wait until the -loadThumbnails method is finished, the trace looks different though:

#0  -[Thumbs dealloc] (self=0x194880, _cmd=0x32299664) at /Users/me/Documents/myapp/Classes/Thumbs.m:26
#1  0x32c0571a in -[NSObject release] ()
#2  0x00009556 in -[WorldLoader dealloc] (self=0x192ba0, _cmd=0x32299664) at /Users/me/Documents/myapp/Classes/WorldLoader.m:33
#3  0x32c0571a in -[NSObject release] ()
#4  0x000045b2 in -[WorldViewController setupWorldWithPath:] (self=0x11e9d0, _cmd=0x3fee0, path=0x4cb84) at /Users/me/Documents/myapp/Classes/WorldViewController.m:98
#5  0x32c29ffa in -[NSObject performSelector:withObject:] ()
#6  0x32b81ece in __NSThreadPerformPerform ()
#7  0x32c23c14 in CFRunLoopRunSpecific ()
#8  0x32c234e0 in CFRunLoopRunInMode ()
#9  0x30d620da in GSEventRunModal ()
#10 0x30d62186 in GSEventRun ()
#11 0x314d54c8 in -[UIApplication _run] ()
#12 0x314d39f2 in UIApplicationMain ()
#13 0x00002fd2 in main (argc=1, argv=0x2ffff5dc) at /Users/me/Documents/myapp/main.m:14

Quite different, in fact. At this point, I was still clueless, believe it or not, but I finally figured out what was going on.

The problem is the following: when I do [NSThread detachNewThreadSelector:] in the thumbnail loader, NSThread retains the object until the thread runs out. In the case where the thumbnail loading doesn't finish before I load another session, all of my retains on the thumbnail loader are released, but since the thread is still running, NSThread keeps it alive.

As soon as the thread returns from -loadThumbnails, NSThread releases it, it hits 0 retain and goes straight into -dealloc... while still in the background thread.

And when I then call [super dealloc], UIView obediently tries to remove itself from its superview, which is a UIKit call on a background thread. Consequently a leak occurs.

The solution I came up with to solve this was to wrap the loader in two other methods. I renamed it to -_loadThumbnails and then did the following:

[self retain]; // <-- added this before the detaching
[NSThread detachNewThreadSelector:@selector(loadThumbnails) toTarget:self withObject:nil];

// added these two new methods
- (void)doneLoadingThumbnails
{
    [self release];
}
-(void)loadThumbnails
{
    [self _loadThumbnails];
    [self performSelectorOnMainThread:@selector(doneLoadingThumbnails) withObject:nil waitUntilDone:NO];
}

All that said (and I said a lot -- sorry about that), the big question is: how do you figure these odd-ball things out without going through all of the above?

What reasoning did I miss in the above process? At what point did you realize where the problem was? What were the redundant steps in my method? Can I skip phase 3 (return galore) somehow, or cut it down, or make it more efficient?

I know this question is, well, vague and huge, but this whole concept is vague and huge. I'm not asking you to teach me how to find leaks (I can do that... it's just very, very painful), I'm asking what people tend to do to cut down on the process time. Asking people "how do you find leaks?" is impossible, because there are so many different kinds. But the one type I tend to have issues with is the one that looks like the above, with no calls inside your actual app.

What process do you use to more efficiently track it down?

分享到QQ

分享到微博