Java 分析、性能调优和内存分析练习

发布于 2024-09-12 15:47:30 字数 608 浏览 4 评论 0原文

我即将使用 JProfiler 和 Eclipse Tptp。我需要一套可以提供给参与者的练习，他们可以：使用该工具来分析和发现问题：瓶颈、内存泄漏、次优代码等。我相信有大量的经验和现实生活中的例子。

解决问题并实现优化的代码
通过执行另一个分析会话来演示解决方案
理想情况下，编写演示性能增益的单元测试

问题或解决方案不应过于复杂；应该可以在最好的几分钟内、最坏的情况下几个小时内解决这些问题。一些值得练习的有趣领域：

解决内存泄漏
优化循环优化
对象创建和管理
优化字符串操作
解决因并发和并发瓶颈而加剧的问题

理想情况下，练习应包括示例未优化代码和解决方案代码。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

往事风中埋 2024-09-19 15:47:31

我尝试找到我在野外见过的现实生活中的例子（可能略有改变，但基本问题都是非常真实的）。我还尝试将它们聚集在同一场景中，以便您可以轻松地建立会话。

场景：您有一个耗时的函数，您想要对不同的值执行多次，但相同的值可能会再次弹出（理想情况是在创建后不久）。一个很好且简单的例子是您需要下载和处理的 url-网页对（对于练习，它可能应该被模拟）。

循环：

您想要检查页面中是否弹出一组单词。在循环中使用您的函数，但具有相同的值，伪代码：
```
for (单词 : 单词) {
    checkWord(下载(url))
}
```
一个解决方案非常简单，只需下载循环之前的页面即可。
其他解决方案如下。

内存泄漏：

很简单：您还可以使用一种缓存来解决您的问题。在最简单的情况下，您可以将结果放入（静态）地图中。但如果你不阻止它，它的体积就会无限增大 ->内存泄漏。
可能的解决方案：使用 LRU 映射。性能很可能不会降低太多，但内存泄漏应该会消失。
更棘手的一个：假设您使用 WeakHashMap 实现前面的缓存，其中键是 URL（不是字符串，请参阅下文），值是包含 URL、下载的页面和其他的东西。您可能认为它应该没问题，但事实上并非如此：由于值（不是弱引用）引用了键（URL），因此键永远没有资格清理 ->不错的内存泄漏。
解决方案：从值中删除 URL。
与以前相同，但 url 是内部字符串（“如果我们碰巧再次拥有相同的字符串，则可以节省一些内存”），值不引用此内容。我没有尝试过，但在我看来，这也会导致泄漏，因为实习字符串不能被 GC 处理。
解决方案：不要实习，这也会导致你绝对不能跳过的建议：不要做过早优化，因为它是万恶之源。

对象创建和字符串：

表示您只想显示页面的文本（~删除 html 标签）。编写一个函数，逐行执行此操作，并将其附加到不断增长的结果中。最初结果应该是一个字符串，因此附加将花费大量时间和对象分配。您可以从性能角度（为什么追加如此慢）和对象创建角度（为什么我们创建了这么多字符串、StringBuffer、数组等）来检测这个问题。
解决方案：使用 StringBuilder 来获取结果。

并发性：

您希望通过并行下载/过滤来加快整个过程。创建一些线程并使用它们运行代码，但在一个大的同步块（基于缓存）中执行所有操作，只是“保护缓存免受并发问题”。效果应该是您有效地仅使用一个线程，因为所有其他线程都在等待获取缓存上的锁。
解决方案：仅围绕缓存操作进行同步（例如使用“java.util.collections.synchronizedMap()”）
同步所有微小的代码片段。这会降低性能，可能会阻止正常的并行执行。如果你足够幸运/聪明，你也可以想出一个死锁。
这个寓意是：同步不应该是一个临时的事情，在“它不会造成伤害”的基础上，而是一个经过深思熟虑的事情。

奖励练习：

一开始就填满缓存，之后不要进行太多分配，但某处仍然有小泄漏。通常这种模式不太容易捕捉。您可以使用探查器的“书签”或“水印”功能，该功能应在缓存完成后立即创建。

I try to find real life examples that I've seen in the wild (maybe slightly altered, but the basic problems were all very real). I've also tried to cluster them around the same scenario, so you can build up a session easily.

Scenario: you have a time consuming function that you want to do many times for different values, but the same values may pop up again (ideally not too long after it was created). A good and simple example is url-web page pairs that you need to download and process (for the exercise it should be probably simulated).

Loops:

You want to check if any of a set of words pops up in the pages. Use your function in a loop, but with the same value, pseudo code:
```
for (word : words) {
    checkWord(download(url))
}
```
One solution is quite easy, just download the page before the loop.
Other solution is below.

Memory leak:

simple one: you can also solve your problem with a kind of cache. In the simplest case you can just put the results to a (static) map. But if you don't prevent it, its size will grow infinitely -> memory leak.
Possible solution: use an LRU map. Most likely performance will not degrade too much, but the memory leak should go away.
trickier one: say you implement the previous cache using a WeakHashMap, where the keys are the URLs (NOT as strings, see later), values are instances of a class that contain the URL, the downloaded page and something else. You may assume that it should be fine, but in fact it is not: as the value (which is not weakly referenced) has a reference to the key (the URL) the key will never be eligible to clean up -> nice memory leak.
Solution: remove the URL from the value.
Same as before, but the urls are interned strings ("to save some memory if we happen to have the same strings again"), value does not refer to this. I did not try it, but it seems to me that it would also cause a leak, because interned Strings can not be GC-ed.
Solution: do not intern, which will also lead to the advice that you must not skip: don't do premature optimization, as it is the root of all evil.

Object creation & Strings:

say you want to display the text of the pages only (~remove html tags). Write a function that does it line by line, and appends it to a growing result. At first the result should be a string, so appending will take a lot of time and object allocation. You can detect this problem from performance point of view (why appends are so slow) and from object creation point of view (why we created so many Strings, StringBuffers, arrays, etc).
Solution: use a StringBuilder for the result.

Concurrency:

You want to speed the whole stuff up by doing downloading/filtering in parallel. Create some threads and run your code using them, but do everything inside a big synchronized block (based on the cache), just "to protect the cache from concurrency problems". Effect should be that you effectively use just one thread, as all the others are waiting to acquire the lock on the cache.
Solution: synchronize only around cache operations (e.g. use `java.util.collections.synchronizedMap())
Synchronize all tiny little pieces of code. This should kill performance, probably prevent normal parallel execution. If you are lucky/smart enough you can come up with a dead lock also.
Moral of this: synchronization should not be an ad hoc thing, on an "it will not hurt" basis, but a well thought thing.

Bonus exercise:

Fill up your cache at the beginning and don't do too much allocation afterward, but still have a small leak somewhere. Usually this pattern is not too easy to catch. You can use a "bookmark", or "watermark" feature of the profiler, which should be created right after the caching is done.

回复收藏 0 原文

夏日浅笑〃 2024-09-19 15:47:31

不要忽略此方法因为它适用于任何语言和操作系统，这些原因。此处就是一个示例。另外，尝试使用具有 I/O 和显着调用深度的示例。不要只使用像 Mandelbrot 这样的小型 CPU 密集型程序。如果您采用那个不太大的 C 示例，并用 Java 对其进行重新编码，那么这应该可以说明您的大部分观点。

让我们看看：

解决内存泄漏。
垃圾收集器的全部目的是堵塞内存泄漏。但是，您仍然可以分配过多的内存，这在某些对象的“new”中显示为很大一部分时间。
优化循环。
一般来说，循环不需要优化，除非循环内部几乎没有做任何事情（并且它们花费了很大一部分时间）。
优化对象创建和管理。
这里的基本方法是：使数据结构尽可能简单。尤其要远离通知式的尝试来保持数据一致，因为这些东西会消失并使调用树变得非常茂密。这是大型软件中出现性能问题的主要原因。
优化字符串操作。
使用字符串生成器，但不要编写不使用固定百分比执行时间的代码。
并发。
并发有两个目的。
1) 性能，但这仅在允许多个硬件同时启动的情况下有效。如果硬件不存在，那就没有帮助。很痛。
2) 表达清晰，例如 UI 代码不必担心同时进行繁重的计算或网络 I/O。

无论如何，这一点怎么强调都不为过，在证明某件事需要花费相当多的时间之前，不要进行任何优化。