我的 python 程序比同一程序的 java 版本执行得更快。是什么赋予了？

发布于 2024-07-23 10:04:09 字数 2684 浏览 8 评论 0原文

更新时间：2009-05-29

感谢大家的建议和建议。 我根据您的建议使我的生产代码的执行速度比几天前的最佳结果平均快 2.5 倍。最终我能够使 java 代码成为最快的。

经验教训：

下面的示例代码显示了原始整数的插入，但生产代码实际上是存储字符串（我的错）。当我更正 python 执行时间从 2.8 秒到 9.6 秒时。因此，java 在存储对象时实际上速度更快。
但它并不止于此。我一直在执行java程序如下：
java -Xmx1024m SpeedTest

但是，如果您按如下方式设置初始堆大小，您将获得巨大的改进：

java -Xms1024m -Xmx1024m SpeedTest

这个简单的更改将执行时间减少了 50% 以上。所以我的 SpeedTest 的最终结果是 python 9.6 秒。 Java 6.5 秒。

原始问题：

我有以下 python 代码：

import time
import sys

def main(args):    
    iterations = 10000000
    counts = set()
    startTime = time.time();    
    for i in range(0, iterations):
        counts.add(i)
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
    main(sys.argv)

它在我的机器上执行了大约 3.3 秒，但我想让它更快，所以我决定用 java 对其进行编程。我认为因为 java 是经过编译的并且通常被认为比 python 更快，所以我会看到一些巨大的回报。

这是 java 代码：

import java.util.*;
class SpeedTest
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;
        HashSet counts = new HashSet((2*iterations), 0.75f);

        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++)
        {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());
    }
}

所以这个 java 代码基本上与 python 代码做同样的事情。但它执行了 8.3 秒，而不是 3.3 秒。

我从现实世界的例子中提取了这个简单的例子来简化事情。关键要素是我有（set 或 hashSet），最终有很多成员，就像示例一样。

这是我的问题：

为什么我的 python 实现比我的 java 实现更快？
是否有比 hashSet (java) 更好的数据结构来保存唯一集合？
什么会让 python 实现更快？
什么会使 java 实现更快？

更新：

感谢迄今为止所有做出贡献的人。请允许我添加一些细节。

我没有包含我的生产代码，因为它非常复杂。并且会产生很多干扰。我上面介绍的情况是最简单的情况。我的意思是，java put 调用似乎比 python set 的 add() 慢得多。

生产代码的 java 实现也比 python 版本慢约 2.5 - 3 倍——就像上面一样。

我不关心虚拟机预热或启动开销。我只想将 startTime 和 TotalTime 中的代码进行比较。请不要关心其他事情。

我用足够多的存储桶初始化了哈希集，这样它就不必重新哈希。（我总是会提前知道集合最终将包含多少个元素。）我想有人可能会说我应该将其初始化为 iterations/0.75。但如果您尝试一下，您会发现执行时间并没有受到显着影响。

我为那些好奇的人设置了 Xmx1024m（我的机器有 4GB 内存）。

我正在使用 java 版本：Java(TM) SE 运行时环境（内部版本 1.6.0_13-b03）。

在生产版本中，我在 hashSet 中存储一个字符串（2-15 个字符），因此我无法使用原语，尽管这是一个有趣的情况。

我已经运行该代码很多很多次了。我非常有信心 python 代码比 java 代码快 2.5 到 3 倍。

原文

Update: 2009-05-29

Thanks for all the suggestions and advice. I used your suggestions to make my production code execute 2.5 times faster on average than my best result a couple of days ago. In the end I was able to make the java code the fastest.

Lessons:

My example code below shows the insertion of primitive ints but the production code is actually storing strings (my bad). When I corrected that the python execution time went from 2.8 seconds to 9.6. So right off the bat, the java was actually faster when storing objects.
But it doesn't stop there. I had been executing the java program as follows:
java -Xmx1024m SpeedTest

But if you set the initial heap size as follows you get a huge improvement:

java -Xms1024m -Xmx1024m SpeedTest

This simple change reduced the execution time by more than 50%. So the final result for my SpeedTest is python 9.6 seconds. Java 6.5 seconds.

Original Question:

I had the following python code:

import time
import sys

def main(args):    
    iterations = 10000000
    counts = set()
    startTime = time.time();    
    for i in range(0, iterations):
        counts.add(i)
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
    main(sys.argv)

And it executed in about 3.3 seconds on my machine but I wanted to make it faster so I decided to program it in java. I assumed that because java is compiled and is generally considered to be faster than python I would see some big paybacks.

Here is the java code:

import java.util.*;
class SpeedTest
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;
        HashSet counts = new HashSet((2*iterations), 0.75f);

        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++)
        {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());
    }
}

So this java code does basically the same thing as the python code. But it executed in 8.3 seconds instead of 3.3.

I have extracted this simple example from a real-world example to simplify things. The critical element is that I have (set or hashSet) that ends up with a lot of members much like the example.

Here are my questions:

How come my python implementation is faster than my java implementation?
Is there a better data structure to use than the hashSet (java) to hold a unique collection?
What would make the python implementation faster?
What would make the java implementation faster?

UPDATE:

Thanks to all who have contributed so far. Please allow me to add some details.

I have not included my production code because it is quite complex. And would generate a lot of distraction. The case I present above is the most simplified possible. By that I mean that the java put call seems to be much slower than the python set`s add().

The java implementation of the production code is also about 2.5 - 3 times slower than the python version -- just like the above.

I am not concerned about vm warmup or startup overhead. I just want to compare the code from my startTime to my totalTime. Please do not concern yourselves with other matters.

I initialized the hashset with more than enough buckets so that it should never have to rehash. (I will always know ahead of time how many elements the collection will ultimately contain.) I suppose one could argue that I should have initialized it to iterations/0.75. But if you try it you will see that execution time is not significantly impacted.

I set Xmx1024m for those that were curious (my machine has 4GB of ram).

I am using java version: Java(TM) SE Runtime Environment (build 1.6.0_13-b03).

In the production version of I am storing a string (2-15 chars) in the hashSet so I cannot use primitives, although that is an interesting case.

I have run the code many, many times. I have very high confidence that the python code is between 2.5 and 3 times faster than the java code.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

只是一片海 2024-07-30 10:04:09

您并不是真正测试 Java 与 Python，而是使用自动装箱的整数与 Python 的本机集合和整数处理来测试 java.util.HashSet。

显然，在这个特定的微基准测试中，Python 端确实更快。

我尝试用 GNU trove 中的 TIntHashSet 替换 HashSet，并实现了 3 和 3 之间的加速因子4、Java 稍微领先于 Python。

真正的问题是您的示例代码是否真的像您想象的那样代表您的应用程序代码。您是否运行过分析器并确定大部分 CPU 时间都花在将大量整数放入 HashSet 中？如果不是，这个例子就没有意义。即使唯一的区别是您的生产代码存储除整数之外的其他对象，它们的创建和哈希码的计算也可以轻松地主导集合插入（并完全破坏Python在专门处理整数方面的优势），使整个问题变得毫无意义。

我的 python 程序比同一程序的 java 版本执行得更快。 是什么赋予了？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（20）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

我的 python 程序比同一程序的 java 版本执行得更快。是什么赋予了？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。