优化程序速度的一般方法

发布于 2024-10-03 00:45:42 字数 181 浏览 6 评论 0原文

在速度方面优化 Java 程序的通用方法有哪些?我正在使用 DOM 解析器来解析 XML 文件,然后将某些单词存储在 ArrayList 中,删除任何重复项,然后通过为每个单词创建 Google 搜索 URL 来拼写检查这些单词,获取 html 文档,找到更正的单词并将其保存到另一个ArrayList。

任何帮助将不胜感激!谢谢。

What are some generic methods for optimizing a program in Java, in terms of speed. I am using a DOM Parser to parse an XML file and then store certain words in an ArrayList, remove any duplicates then spell check those words by creating Google search URL's for each word, get the html document, locate the corrected word and save it to another ArrayList.

Any help would be appreciated! Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

任谁 2024-10-10 00:45:42

为什么需要提高性能?从您的解释来看,很明显,这里的大瓶颈(或性能影响)将是由于您正在访问 URL 而导致的IO

这肯定会使您在数据结构或 XML 框架中所做的任何微小改进相形见绌

一个很好的一般经验法则是,您的大性能问题将涉及 IO。有趣的是,我正在等待数据库查询以批处理方式返回。它已经运行了将近一个小时。但无论如何,我欢迎对我的 XML 解析库提出的任何改进建议!

以下是我的一般方法:

  • 延迟(IO)的角度来看,您的程序是否执行任何明显昂贵的任务?您是否有足够的日志记录来查看这就是延迟所在(如果很重要)?

  • 您的程序是否容易出现锁争用(即它可以等待,什么都不做,等待某些资源“空闲”)? 也许您正在锁定整个Map,同时对要存储的值进行昂贵的计算,从而阻止其他线程访问该映射

  • 是否有一些明显的算法(也许 是否有一些明显的算法(可能用于数据匹配或排序)可能具有较差的特性?

  • 运行一个探查器(例如jvisualvm,它随 JDK 本身一起提供)并查看您的代码热点。 JVM 将时间花在哪里?

Why do you need to improve performance? From your explanation, it is pretty obvious that the big bottleneck here (or performance hit) is going to be the IO resulting from the fact that you are accessing a URL.

This will surely dwarf by orders of magnitude any minor improvements you make in data structures or XML frameworks.

It is a good general rule of thumb that your big performance problems will involve IO. Humorously enough, I am at this very moment waiting for a database query to return in a batch process. It has been running for almost an hour. But I welcome any suggested improvements to my XML parsing library nevertheless!

Here are my general methods:

  • Does your program perform any obviously expensive task from the perspective of latency (IO)? Do you have enough logging to see that this is where the delay is (if significant)?

  • Is your program prone to lock-contention (i.e. can it wait around, doing nothing, waiting for some resource to be "free")? Perhaps you are locking an entire Map whilst you make an expensive calculation for a value to store, blocking other threads from accessing the map

  • Is there some obvious algorithm (perhaps for data-matching, or sorting) that might have poor characteristics?

  • Run up a profiler (e.g. jvisualvm, which ships with the JDK itself) and look at your code hotspots. Where is the JVM spending its time?

或十年 2024-10-10 00:45:42

SAX 比 DOM 更快。如果您不想在 ArrayList 中搜索重复项,请将所有内容放入 LinkedHashMap 中——没有重复项,并且您仍然可以获得 ArrayList 提供的插入顺序。

但真正的瓶颈是将 HTTP 请求发送到 Google,等待响应,然后解析响应。请改用拼写检查库。

编辑:但是请对我的有根据的猜测持保留态度。使用代码分析器来查看真正减慢程序速度的因素。

SAX is faster than DOM. If you don't want to go through the ArrayList searching for duplicates, put everything in a LinkedHashMap -- no duplicates, and you still get the order-of-insertion that ArrayList gives you.

But the real bottleneck is going to be sending the HTTP request to Google, waiting for the response, then parsing the response. Use a spellcheck library, instead.

Edit: But take my educated guesses with a grain of salt. Use a code profiler to see what's really slowing down your program.

痴情换悲伤 2024-10-10 00:45:42

一般来说,最好的方法是找出瓶颈所在并解决它。您通常会发现 90% 的时间都花在一小部分代码上,而这正是您想要集中精力的地方。

一旦你弄清楚什么需要花费大量时间,就集中精力改进你的算法。例如,如果您使用最明显的算法,从 ArrayList 中删除重复项的复杂度可能是 O(n²),但如果您利用正确的数据结构,复杂度可以降低到 O(n)。

一旦您弄清楚代码的哪些部分花费了最多时间,并且您无法弄清楚如何最好地解决它,我建议您缩小问题范围并在 StackOverflow 上发布另一个问题。

编辑

正如 @oxbow_lakes 讽刺的那样,并非所有性能瓶颈都可以在代码的 big-O 特征中找到。我当然无意暗示他们是这样的。由于问题是关于优化的“一般方法”,因此我尝试坚持一般想法,而不是谈论这个特定的程序。但您可以通过以下方式将我的建议应用到这个特定的程序中:

  1. 看看您的瓶颈在哪里。分析代码的方法有很多种,从高端、昂贵的分析软件到真正的 hacky。很可能,这些方法中的任何一种都表明您的程序花费了 99% 的时间等待 Google 的响应。
  2. 专注于算法。现在你的算法(大致)是:
    1. 解析 XML
    2. 创建单词列表
    3. 对于每个单词
      1. Ping Google 进行拼写检查。
    4. 返回结果

由于您的大部分时间都花费在“ping Google”阶段,因此解决此问题的一个明显方法是避免不必要地执行该步骤。例如:

  1. 解析 XML
  2. 创建单词列表
  3. 将单词列表发送到拼写服务。
  4. 解析拼写服务的结果。
  5. 返回结果

当然,在这种情况下,最大的速度提升可能是使用在同一台计算机上运行的拼写检查器,但这并不总是一个选择。例如,TinyMCE 作为浏览器中的 JavaScript 程序运行,它无法将整个词典下载为网页的一部分。因此,它将所有单词打包到一个不同的列表中,并执行单个 AJAX 请求来获取字典中没有的单词的列表。

Generally the best method is to figure out where your bottleneck is, and fix it. You'll usually find that you spend 90% of your time in a small portion of your code, and that's where you want to focus your efforts.

Once you've figured out what's taking a lot of time, focus on improving your algorithms. For example, removing duplicates from an ArrayList can be O(n²) complexity if you're using the most obvious algorithm, but that can be reduced to O(n) if you leverage the correct data structures.

Once you've figured out which portions of your code are taking the most time, and you can't figure out how best to fix it, I'd suggest narrowing down your question and posting another question here on StackOverflow.

Edit

As @oxbow_lakes so snidely put it, not all performance bottlenecks are to be found in the code's big-O characteristics. I certainly had no intention to imply that they were. Since the question was about "general methods" for optimizing, I tried to stick to general ideas rather than talking about this specific program. But here's how you can apply my advice to this specific program:

  1. See where your bottleneck is. There are a number of ways to profile your code, ranging from high-end, expensive profiling software to really hacky. Chances are, any of these methods will indicate that your program spends the 99% of its time waiting for a response from Google.
  2. Focus on algorithms. Right now your algorithm is (roughly):
    1. Parse the XML
    2. Create a list of words
    3. For each word
      1. Ping Google for a spell check.
    4. Return results

Since most of your time is spent in the "ping Google" phase, an obvious way to fix this would be to avoid doing that step more times than necessary. For example:

  1. Parse the XML
  2. Create a list of words
  3. Send list of words to spelling service.
  4. Parse results from spelling service.
  5. Return results

Of course, in this case, the biggest speed boost would probably be by using spell checker that runs on the same machine, but that isn't always an option. For example, TinyMCE runs as a javascript program within the browser, and it can't afford to download the entire dictionary as part of the web page. So it packages up all the words into a distinct list and performs a single AJAX request to get a list of those words that aren't in the dictionary.

静待花开 2024-10-10 00:45:42

这些人可能是对的,但有一些随机暂停会将“可能”变成“肯定,这就是原因”。

These folks are probably right, but a few random pauses will turn *probably" into "definitely, and here's why".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文