Java API 方法运行时间
有没有好的资源可以获取标准 API 函数的运行时间?当尝试优化你的程序时,这有点令人困惑。我知道 Java 的速度并不是特别快,但我似乎根本找不到太多关于这方面的信息。
问题示例: 如果我要在文件中查找某个标记,使用 string.contains(...) 扫描每一行或者引入大约 100 行左右的行将它们放入本地字符串中(它们在该块上执行 contains )会更快吗?
Is there a good resource to get run times for standard API functions? It's somewhat confusing when trying to optimize your program. I know Java isn't made to be particularly speedy but I can't seem to find much info on this at all.
Example Problem:
If I am looking for a certain token in a file is it faster to scan each line using string.contains(...) or to bring in say 100 or so lines putting them to a local string them performing contains on that chunk.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
AFAIK,API 方法没有标准基准,事实上,根据您运行的 JVM,可能有各种实现。再加上 JVM 的 JIT 优化、垃圾收集和许多其他东西,我怀疑您能否获得全局有意义的数字。您能做的就是编写自己的基准测试。
某些方法在其 JavaDoc 中指定了操作的计算复杂性。其他一些方法描述了其他性能问题。确保您了解它们并注意它们。
但除此之外,很可能您正在进行过早的优化。使用分析器查看它实际上是一个瓶颈。
例如,在您的情况下,会有从文件读取的成本、将字符串放入大缓冲区的成本等。我不确定您是否可以通过在字符串级别读取来真正进行优化。如果这确实是关键任务,您可以逐个字符读取并实现智能匹配算法,而无需创建字符串,这可能会稍微快一些。
AFAIK, there are no standard benchmarks for the API methods, and in fact, there could be various implementations based on the JVM you are running. Couple that with the JVM's JIT optimizations, garbage collections, and a lot of other things, and I doubt you could get globally meaningful numbers. Most you can do is write your own benchmarks.
Some methods specify the computational complexity of the operations in their JavaDocs. Some other methods describe other performance concerns. Make sure you are aware of them and heed them.
But beyond that, most chances are that you are doing premature optimizations. Use a profiler to see it is actually a bottleneck.
For instance, in your case there will be the cost of reading from a file, the cost of placing strings in the large buffer, etc. I'm not sure you can really optimize by reading at the string level. IF this was really mission critical you could read character-by-character and implement a smart matching algorithm without ever creating strings, this might be slightly faster.
您正在寻找分析器
You're looking for a profiler
没有文档,因为不同机器、不同操作系统的情况差异很大。要获得程序的准确计时,请使用分析器。 NetBeans 分析器很好。
至于找出哪个最快,没有比对两者都进行编码更好的选择了。或者,您可以编写最简单的替代方案,当它工作时,您可能会发现它的速度足以满足您的需求,而不必编写更复杂的实现。
There is no documentation, since it will vary considerably from machine to machine, OS to OS. To get accurate timings for your program, use a profiler. The NetBeans profiler is good.
As to finding out which is the fastest, there is no better alternative then to code both. Alternatively, you might code the simplest alternative, and when it's working, you might discover that it's fast enough for you needs, and not bother coding the more complex implementation.
如果我正确理解你的问题,你会问是从某处读取一行更好,还是从内存中读取一行更好。
将文本加载到内存中进行扫描总是比从 I/O 流(尤其是从磁盘)读取它们要快。读取速度与 Java 无关,而与源将数据传输到程序的速度有关。
If I understand your question correctly, your asking if it's better to read a line from somewhere, or to read a line from memory.
It will always be faster to have the text loaded into memory to do your scans then to read them from an I/O stream, especially from disk. The speed of the read has nothing to do with Java, but how fast the source can get that data to your program.
我同意使用 Profiler 的想法 - 但您可能还想考虑仅使用 log4j (或 Apache Commons Logging 等)来获取有关程序性能的一些廉价统计数据 - 因为生成的日志文件中的日志条目将带有最接近的时间戳毫秒:由于无论如何在调试时记录日志通常是一件有用的事情,因此可能值得首先执行此操作。
学习分析工具和学习如何解释结果数据本身通常是一项不平凡的任务 - 值得做,但您可能能够通过使用记录数据更快地获得一个粗略的想法 - 特别是如果您将其格式化为 CSV 等。这样您就可以导入到电子表格中。
I agree with ideas about using a Profiler - but you might also want to consider just using log4j (or Apache Commons Logging etc) to get some cheap stats about program performance - in that the log entries in the resultant logfiles will get timestamped to the nearest millisecond: Since logging is generally a useful thing to do when debugging anyway, it's probably worth doing this first.
Learning profiling tools and learning how to interpret the resultant data is usually a non-trivial task in of itself - worth doing , but you might be able to get a rough idea more quickly just using logging data - especially if you format it as CSV etc so you import to a spreadsheet.
如果我们忽略磁盘 IO 时间,而只考虑代码中花费的 CPU 时间,则第二个选择将比第一个慢得多。
if we overlook disk IO time, and just consider the CPU time spent in your code, the 2nd choice will be much slower than the first.