我的 python 程序比同一程序的 java 版本执行得更快。 是什么赋予了?

发布于 2024-07-23 10:04:09 字数 2684 浏览 8 评论 0原文

更新时间:2009-05-29

感谢大家的建议和建议。 我根据您的建议使我的生产代码的执行速度比几天前的最佳结果平均快 2.5 倍。最终我能够使 java 代码成为最快的。

经验教训:

  • 下面的示例代码显示了原始整数的插入,但生产代码实际上是存储字符串(我的错)。 当我更正 python 执行时间从 2.8 秒到 9.6 秒时。 因此,java 在存储对象时实际上速度更快。

  • 但它并不止于此。 我一直在执行java程序如下:

    java -Xmx1024m SpeedTest

但是,如果您按如下方式设置初始堆大小,您将获得巨大的改进:

java -Xms1024m -Xmx1024m SpeedTest

这个简单的更改将执行时间减少了 50% 以上。 所以我的 SpeedTest 的最终结果是 python 9.6 秒。 Java 6.5 秒。

原始问题:

我有以下 python 代码:

import time
import sys

def main(args):    
    iterations = 10000000
    counts = set()
    startTime = time.time();    
    for i in range(0, iterations):
        counts.add(i)
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
    main(sys.argv)

它在我的机器上执行了大约 3.3 秒,但我想让它更快,所以我决定用 java 对其进行编程。 我认为因为 java 是经过编译的并且通常被认为比 python 更快,所以我会看到一些巨大的回报。

这是 java 代码:

import java.util.*;
class SpeedTest
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;
        HashSet counts = new HashSet((2*iterations), 0.75f);

        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++)
        {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());
    }
}

所以这个 java 代码基本上与 python 代码做同样的事情。 但它执行了 8.3 秒,而不是 3.3 秒。

我从现实世界的例子中提取了这个简单的例子来简化事情。 关键要素是我有(set 或 hashSet),最终有很多成员,就像示例一样。

这是我的问题:

  1. 为什么我的 python 实现比我的 java 实现更快?

  2. 是否有比 hashSet (java) 更好的数据结构来保存唯一集合?

  3. 什么会让 python 实现更快?

  4. 什么会使 java 实现更快?

更新:

感谢迄今为止所有做出贡献的人。 请允许我添加一些细节。

我没有包含我的生产代码,因为它非常复杂。 并且会产生很多干扰。 我上面介绍的情况是最简单的情况。 我的意思是,java put 调用似乎比 python set 的 add() 慢得多。

生产代码的 java 实现也比 python 版本慢约 2.5 - 3 倍——就像上面一样。

我不关心虚拟机预热或启动开销。 我只想将 startTime 和 TotalTime 中的代码进行比较。 请不要关心其他事情。

我用足够多的存储桶初始化了哈希集,这样它就不必重新哈希。 (我总是会提前知道集合最终将包含多少个元素。)我想有人可能会说我应该将其初始化为 iterations/0.75。 但如果您尝试一下,您会发现执行时间并没有受到显着影响。

我为那些好奇的人设置了 Xmx1024m(我的机器有 4GB 内存)。

我正在使用 java 版本:Java(TM) SE 运行时环境(内部版本 1.6.0_13-b03)。

在生产版本中,我在 hashSet 中存储一个字符串(2-15 个字符),因此我无法使用原语,尽管这是一个有趣的情况。

我已经运行该代码很多很多次了。 我非常有信心 python 代码比 java 代码快 2.5 到 3 倍。

Update: 2009-05-29

Thanks for all the suggestions and advice. I used your suggestions to make my production code execute 2.5 times faster on average than my best result a couple of days ago. In the end I was able to make the java code the fastest.

Lessons:

  • My example code below shows the insertion of primitive ints but the production code is actually storing strings (my bad). When I corrected that the python execution time went from 2.8 seconds to 9.6. So right off the bat, the java was actually faster when storing objects.

  • But it doesn't stop there. I had been executing the java program as follows:

    java -Xmx1024m SpeedTest

But if you set the initial heap size as follows you get a huge improvement:

java -Xms1024m -Xmx1024m SpeedTest

This simple change reduced the execution time by more than 50%. So the final result for my SpeedTest is python 9.6 seconds. Java 6.5 seconds.

Original Question:

I had the following python code:

import time
import sys

def main(args):    
    iterations = 10000000
    counts = set()
    startTime = time.time();    
    for i in range(0, iterations):
        counts.add(i)
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
    main(sys.argv)

And it executed in about 3.3 seconds on my machine but I wanted to make it faster so I decided to program it in java. I assumed that because java is compiled and is generally considered to be faster than python I would see some big paybacks.

Here is the java code:

import java.util.*;
class SpeedTest
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;
        HashSet counts = new HashSet((2*iterations), 0.75f);

        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++)
        {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());
    }
}

So this java code does basically the same thing as the python code. But it executed in 8.3 seconds instead of 3.3.

I have extracted this simple example from a real-world example to simplify things. The critical element is that I have (set or hashSet) that ends up with a lot of members much like the example.

Here are my questions:

  1. How come my python implementation is faster than my java implementation?

  2. Is there a better data structure to use than the hashSet (java) to hold a unique collection?

  3. What would make the python implementation faster?

  4. What would make the java implementation faster?

UPDATE:

Thanks to all who have contributed so far. Please allow me to add some details.

I have not included my production code because it is quite complex. And would generate a lot of distraction. The case I present above is the most simplified possible. By that I mean that the java put call seems to be much slower than the python set`s add().

The java implementation of the production code is also about 2.5 - 3 times slower than the python version -- just like the above.

I am not concerned about vm warmup or startup overhead. I just want to compare the code from my startTime to my totalTime. Please do not concern yourselves with other matters.

I initialized the hashset with more than enough buckets so that it should never have to rehash. (I will always know ahead of time how many elements the collection will ultimately contain.) I suppose one could argue that I should have initialized it to iterations/0.75. But if you try it you will see that execution time is not significantly impacted.

I set Xmx1024m for those that were curious (my machine has 4GB of ram).

I am using java version: Java(TM) SE Runtime Environment (build 1.6.0_13-b03).

In the production version of I am storing a string (2-15 chars) in the hashSet so I cannot use primitives, although that is an interesting case.

I have run the code many, many times. I have very high confidence that the python code is between 2.5 and 3 times faster than the java code.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(20

只是一片海 2024-07-30 10:04:09

您并不是真正测试 Java 与 Python,而是使用自动装箱的整数与 Python 的本机集合和整数处理来测试 java.util.HashSet

显然,在这个特定的微基准测试中,Python 端确实更快。

我尝试用 GNU trove 中的 TIntHashSet 替换 HashSet,并实现了 3 和 3 之间的加速因子4、Java 稍微领先于 Python。

真正的问题是您的示例代码是否真的像您想象的那样代表您的应用程序代码。 您是否运行过分析器并确定大部分 CPU 时间都花在将大量整数放入 HashSet 中? 如果不是,这个例子就没有意义。 即使唯一的区别是您的生产代码存储除整数之外的其他对象,它们的创建和哈希码的计算也可以轻松地主导集合插入(并完全破坏Python在专门处理整数方面的优势),使整个问题变得毫无意义。

You're not really testing Java vs. Python, you're testing java.util.HashSet using autoboxed Integers vs. Python's native set and integer handling.

Apparently, the Python side in this particular microbenchmark is indeed faster.

I tried replacing HashSet with TIntHashSet from GNU trove and achieved a speedup factor between 3 and 4, bringing Java slightly ahead of Python.

The real question is whether your example code is really as representative of your application code as you think. Have you run a profiler and determined that most of the CPU time is spent in putting a huge number of ints into a HashSet? If not, the example is irrelevant. Even if the only difference is that your production code stores other objects than ints, their creation and the computation of their hashcode could easily dominate the set insertion (and totally destroy Python's advantage in handling ints specially), making this whole question pointless.

等风来 2024-07-30 10:04:09

我怀疑Python使用整数值本身作为其散列,并且基于散列表的set实现直接使用该值。 来自源:

这并不一定是坏事! 相反,在大小为 2**i 的表中,取
低位 i 位作为初始表索引非常快,并且有
对于由连续的整数范围索引的字典来说根本没有冲突。
当键是“连续”字符串时,情况大致相同。 所以这
在常见情况下给出比随机更好的行为,这是非常理想的。

这个微基准测试在某种程度上是 Python 的最佳案例,因为它导致哈希冲突完全为零。 然而,如果 Java HashSet 重新散列键,它必须执行额外的工作,并且还会因冲突而导致更糟糕的行为。

如果将范围(迭代)存储在临时变量中并在循环之前对其执行 random.shuffle,则即使随机播放和列表创建是在循环外完成的,运行时间也会慢 2 倍以上。

I suspect that is that Python uses the integer value itself as its hash and the hashtable based implementation of set uses that value directly. From the comments in the source:

This isn't necessarily bad! To the contrary, in a table of size 2**i, taking
the low-order i bits as the initial table index is extremely fast, and there
are no collisions at all for dicts indexed by a contiguous range of ints.
The same is approximately true when keys are "consecutive" strings. So this
gives better-than-random behavior in common cases, and that's very desirable.

This microbenchmark is somewhat of a best case for Python because it results in exactly zero hash collisions. Whereas, if Javas HashSet is rehashing the keys it has to perform the additional work and also gets much worse behavior with collisions.

If you store the range(iterations) in a temporary variable and do a random.shuffle on it before the loop the runtime is more than 2x slower even if the shuffle and list creation is done outside the loop.

ヤ经典坏疍 2024-07-30 10:04:09

另一种可能的解释是,Python 中的集合是用 C 代码原生实现的,而 Java 中的 HashSet 是用 Java 本身实现的。 因此,Python 中的集合本质上应该更快。

Another possible explanation is that sets in Python are implemented natively in C code, while HashSet's in Java are implemented in Java itself. So, sets in Python should be inherently much faster.

凯凯我们等你回来 2024-07-30 10:04:09

根据我的经验,Python 程序比 Java 程序运行得更快,尽管 Java 是一种“低级”语言。 顺便说一句,这两种语言都被编译成字节代码(这就是那些 .pyc 文件——您可以将它们视为类似于 .class 文件)。 两种语言都是在虚拟堆栈机上解释的字节码。

您可能会认为 Python 在处理诸如 ab 之类的事情时会更慢。 在 java 中,ab 将解析为取消引用。 另一方面,Python 必须执行一个或多个哈希表查找:检查本地作用域、检查模块作用域、检查全局作用域、检查内置函数。

另一方面,java 在某些操作上是出了名的糟糕,比如对象创建(这可能是你的例子中的罪魁祸首)和序列化。

总而言之,没有简单的答案。 我不希望任何一种语言对于所有代码示例都更快。

更正:有几个人指出 java 在对象创建方面不再那么糟糕了。 所以,在你的例子中,这是另一回事。 也许自动装箱成本很高,也许在这种情况下 python 的默认哈希算法更好。 根据我的实践经验,当我将 java 代码重写为 python 时,我总是看到性能的提高,但这可能是由于语言的原因,也可能是由于重写通常会带来性能的提高。

It has generally been my experience that python programs run faster than java programs, despite the fact that java is a bit "lower level" language. Incidently, both languages are compiled into byte code (that's what those .pyc file are -- you can think of them as kind of like .class files). Both languages are byte-code interpreted on a virtual stack machine.

You would expect python to be slower at things like, for example, a.b. In java, that a.b will resolve into a dereference. Python, on the other hand, has to do one or more hash table lookups: check the local scope, check the module scope, check global scope, check builtins.

On the other hand, java is notoriously bad at certain operations such as object creation (which is probably the culprit in your example) and serialization.

In summary, there's no simple answer. I wouldn't expect either language to be faster for all code examples.

Correction: several people have pointed out that java isn't so bad at object creation any more. So, in your example, it's something else. Perhaps it's autoboxing that's expensive, perhaps python's default hashing algorithm is better in this case. In my practical experience, when I rewrite java code to python, I always see a performance increase, but that could be as much due to the language as it is due to rewritng in general leads to performance improvements.

放手` 2024-07-30 10:04:09

我想澄清我在答案中看到的一些误解:

是的,Java 被编译为字节码,但最终在大多数运行时环境中编译为本机代码。 那些说 C 本质上更快的人并没有讲述整个故事,我可以说字节编译语言本质上更快,因为 JIT 编译器可以进行特定于机器的优化,而这是超前编译器无法实现的。

许多可能造成差异的因素是:

  • Python 的哈希表和集合是 Python 中最优化的对象,并且 Python 的哈希函数旨在为相似的输入返回相似的结果:对整数进行哈希运算仅返回整数,从而保证您在 Python 中,永远不会在连续整数的哈希表中看到冲突。
  • 上述的第二个影响是,Python 代码将具有较高的引用局部性,因为您将按顺序访问哈希表。
  • 当您将整数添加到集合中时,Java 会对整数执行一些奇特的装箱和拆箱操作。 从好处来看,这使得 Java 中的算术运算速度比 Python 更快(只要您远离 bignum),但从坏处来说,这意味着比您习惯的分配更多。

I'd like to dispel a couple myths I saw in the answers:

Java is compiled, yes, to bytecode but ultimately to native code in most runtime environments. People who say C is inherently faster aren't telling the entire story, I could make a case that byte compiled languages are inherently faster because the JIT compiler can make machine-specific optimizations that are unavailable to way-ahead-of-time compilers.

A number of things that could make the differences are:

  • Python's hash tables and sets are the most heavily optimized objects in Python, and Python's hash function is designed to return similar results for similar inputs: hashing an integer just returns the integer, guaranteeing that you will NEVER see a collision in a hash table of consecutive integers in Python.
  • A secondary effect of the above is that the Python code will have high locality of reference as you'll be accessing the hash table in sequence.
  • Java does some fancy boxing and unboxing of integers when you add them to collections. On the bonus side, this makes arithmetic way faster in Java than Python (as long as you stay away from bignums) but on the downside it means more allocations than you're used to.
月亮坠入山谷 2024-07-30 10:04:09

编辑:对于实际用例,TreeSet 可能会更快,具体取决于分配模式。 我下面的评论仅涉及这种简化的情况。 然而,我不认为这会产生很大的影响。 真正的问题在其他地方。

这里有几个人建议用 TreeSet 替换 HashSet。 这对我来说听起来是一个非常奇怪的建议,因为插入时间为 O(log n) 的数据结构不可能比预先分配足够的存储桶来存储所有元素的 O(1) 结构更快。

下面是一些对此进行基准测试的代码:

import java.util.*;
class SpeedTest
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;
        Set counts;

        System.out.println("HashSet:");
        counts = new HashSet((2*iterations), 0.75f);
        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++) {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());

        counts.clear();

        System.out.println("TreeSet:");
        counts = new TreeSet();
        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++) {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());
    }
}

这是我的机器上的结果:

$ java -Xmx1024M SpeedTest
HashSet:
TOTAL TIME = 4.436
10000000
TreeSet:
TOTAL TIME = 8.163
10000000

有几个人还认为装箱不是性能问题,并且对象创建成本低廉。 虽然对象创建确实很快,但它绝对不如基元快:

import java.util.*;
class SpeedTest2
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;

        System.out.println("primitives:");
        startTime = System.currentTimeMillis();
        int[] primitive = new int[iterations];
        for (int i = 0; i < iterations; i++) {
            primitive[i] = i;
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );

        System.out.println("primitives:");
        startTime = System.currentTimeMillis();
        Integer[] boxed = new Integer[iterations];
        for (int i = 0; i < iterations; i++) {
            boxed[i] = i;
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
    }
}

结果:

$ java -Xmx1024M SpeedTest2
primitives:
TOTAL TIME = 0.058
primitives:
TOTAL TIME = 1.402

此外,创建大量对象会导致垃圾收集器产生额外开销。 当您开始在内存中保存数千万个活动对象时,这一点就变得很重要。

Edit: A TreeSet might be faster for the real use case, depending on allocation patterns. My comments below deals only with this simplified scenario. However, I do not believe that it would make a very significant difference. The real issue lays elsewhere.

Several people here has recommended replacing the HashSet with a TreeSet. This sounds like a very strange advice to me, since there's no way that a data structure with O(log n) insertion time is faster then a O(1) structure that preallocates enough buckets to store all the elements.

Here's some code to benchmark this:

import java.util.*;
class SpeedTest
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;
        Set counts;

        System.out.println("HashSet:");
        counts = new HashSet((2*iterations), 0.75f);
        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++) {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());

        counts.clear();

        System.out.println("TreeSet:");
        counts = new TreeSet();
        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++) {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());
    }
}

And here's the result on my machine:

$ java -Xmx1024M SpeedTest
HashSet:
TOTAL TIME = 4.436
10000000
TreeSet:
TOTAL TIME = 8.163
10000000

Several people also argued that boxing isn't a performance issue and that object creation is inexpensive. While it's true that object creation is fast, it's definitely not as fast as primitives:

import java.util.*;
class SpeedTest2
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;

        System.out.println("primitives:");
        startTime = System.currentTimeMillis();
        int[] primitive = new int[iterations];
        for (int i = 0; i < iterations; i++) {
            primitive[i] = i;
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );

        System.out.println("primitives:");
        startTime = System.currentTimeMillis();
        Integer[] boxed = new Integer[iterations];
        for (int i = 0; i < iterations; i++) {
            boxed[i] = i;
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
    }
}

Result:

$ java -Xmx1024M SpeedTest2
primitives:
TOTAL TIME = 0.058
primitives:
TOTAL TIME = 1.402

Moreover, creating a lot of objects results in additional overhead from the garbage collector. This becomes significant when you start keeping tens of millions of live objects in memory.

╭ゆ眷念 2024-07-30 10:04:09

我发现这样的基准毫无意义。 我不解决看起来像测试用例的问题。 这并不是很有趣。

我更愿意看到使用 NumPy 和 JAMA 获得有意义的线性代数解决方案的解决方案。 也许我会尝试一下并报告结果。

I find benchmarks like this to be meaningless. I don't solve problems that look like the test case. It's not terribly interesting.

I'd much rather see a solution for a meaningful linear algebra solution using NumPy and JAMA. Maybe I'll try it and report back with results.

美男兮 2024-07-30 10:04:09

我对 python 不太熟悉,但我知道 HashSet 不能包含基元,所以当你说 counts.add(i)i< /code> 自动装箱到 new Integer(i) 调用中。 这可能是你的问题。

如果由于某种原因你确实需要一个 0 到某个大 n 之间的整数“集合”,那么最好将其声明为“boolean[] set = new boolean[n]”。 然后,您可以遍历数组并将集合中的项目标记为“true”,而不会产生创建 n 个 Integer 包装对象的开销。 如果您想更进一步,您可以使用大小为 n/8 的 byte[] 并直接使用各个位。 或者也许是 BigInteger。

编辑

停止投票我的答案。 这是不对的。

编辑

不,真的,这是错误的。 如果我按照问题的建议操作,用 N 个整数填充集合,我会获得类似的性能。 如果我将 for 循环的内容替换为:

    Integer[] ints = new Integer[N];
    for (int i = 0; i < N; ++i) {
        ints[i] = i;
    }

那么只需要 2 秒。 如果您根本不存储 Integer,则需要不到 200 毫秒。 强制分配 10000000 个 Integer 对象确实需要一些时间,但看起来大部分时间都花在 HashSet put 操作中。

I'm not too familiar with python, but I do know HashSet can't contain primitives, so when you say counts.add(i) the i there is getting autoboxed into a new Integer(i) call. That's probably your problem.

If for some reason you really needed a 'set' of integers between 0 and some large n, its probably best declared as an 'boolean[] set = new boolean[n]'. Then you could go through the array and mark items that are in the set as 'true' without incurring the overhead of creating n Integer wrapper objects. If you wanted to go further than that you could use a byte[] of size n/8 and use the individual bits directly. Or perhaps BigInteger.

EDIT

Stop voting my answer up. Its wrong.

EDIT

No really, its wrong. I get comparable performance if I do what the question suggest, populate the set with N Integers. if I replace the contents of the for loop with this:

    Integer[] ints = new Integer[N];
    for (int i = 0; i < N; ++i) {
        ints[i] = i;
    }

Then it only takes 2 seconds. If you don't store the Integer at all then it takes less than 200 millis. Forcing the allocation of 10000000 Integer objects does take some time, but it looks like most of the time is spent inside the HashSet put operation.

海的爱人是光 2024-07-30 10:04:09

这里有很多问题,我想把它们放在一起。

首先,如果它是一个只运行一次的程序,那么多花几秒钟有什么关系吗?

其次,这只是一个微基准测试。 微基准测试对于比较性能毫无意义。

启动有很多问题。

Java 运行时比 Python 大得多,因此从磁盘加载需要更长的时间并占用更多内存,如果您要进行交换,这可能很重要。

如果您尚未设置 -Xms,您可能只是为了调整堆大小而运行 GC。 最好一开始就调整堆的大小。

确实,Java 首先是解释,然后是编译。 Sun 客户端 [C1] Hotspot 大约有 1,500 次迭代,服务器 [C2] 大约有 10,000 次迭代。 服务器热点最终将为您提供更好的性能,但会占用更多内存。 我们可能会看到客户端 Hotspot 使用服务器来执行非常频繁的代码,以获得两全其美的效果。 然而,这通常不应该是几秒钟的问题。

最重要的是,每次迭代您可能会创建两个对象。 对于大多数代码,您不会为如此比例的执行创建这些微小对象。 TreeSet 在对象数量方面可能会更好,6u14 和 Harmony 甚至更好。

Python 可能通过在引用中存储小整数对象而不是实际拥有一个对象来获胜。 这无疑是一个很好的优化。

许多基准测试的一个问题是您在一种方法中混合了许多不同的代码。 你不会这样编写你关心的代码,对吗? 那么,为什么您要尝试进行与您实际希望快速运行的代码不同的性能测试呢?

更好的数据结构:像 BitSet 这样的东西似乎是有意义的(尽管它具有同步性,这可能会也可能不会影响性能)。

There's a number of issues here which I'd like to bring together.

First if it's a program that you are only going to run once, does it matter it takes an extra few seconds?

Secondly, this is just one microbenchmarks. Microbenchmarks are pointless for comparing performance.

Startup has a number of issues.

The Java runtime is much bigger than Python so takes longer to load from disk and takes up more memory which may be important if you are swapping.

If you haven't set -Xms you may be running the GC only to resize the heap. Might as well have the heap properly sized at the start.

It is true that Java starts off interpreting and then compiles. Around 1,500 iterations for Sun client [C1] Hotspot and 10,000 for server [C2]. Server Hotspot will give you better performance eventually, but take more memory. We may see client Hotspot use server for very frequently executed code for best of both worlds. However, this should not usually be a question of seconds.

Most importantly you may be creating two objects per iteration. For most code, you wouldn't be creating these tiny objects for such a proportion of the execution. TreeSet may be better on number of objects score, with 6u14 and Harmony getting even better.

Python may possibly be winning by storing small integer objects in references instead of actually have an object. That is undoubtably a good optimisation.

A problem with a lot of benchmarks is you are mixing a lot of different code up in one method. You wouldn't write code you cared about like that, would you? So why are you attempting to performance test which is unlike code you would actually like to run fast?

Better data structure: Something like BitSet would seem to make sense (although that has ynchronisation on it, which may or may not impact performance).

匿名。 2024-07-30 10:04:09

您需要多次运行它才能真正了解每次运行的“速度”。 JVM 启动时间[对于一个]正在增加 Java 版本的单次运行时间。

您还创建了一个具有较大初始容量的 HashSet,这意味着将使用许多可用槽创建支持 HashMap,这与您创建基本 Set 的 Python 不同。 很难说这是否会阻碍,因为当你的 HashSet 增长时,它必须重新分配存储的对象。

You need to run it multiple times to get a real idea of "how fast" each runs. The JVM startup time [for one] is adding to the single running time of the Java version.

You're also creating a HashSet with a large initial capacity, which means the backing HashMap will be created with that many available slots, unlike the Python where you create a basic Set. Hard to tell if that would hinder though, as when your HashSet grows it will have to reallocate the stored objects.

强辩 2024-07-30 10:04:09

您是否在 jvm 中使用 -server 标志? 没有它你就无法测试性能。 (您还必须在进行测试之前预热 jvm。)

此外,您可能想要使用 TreeSet。 从长远来看,HashSet 会变慢。

你使用哪个jvm? 希望是最新的。

编辑

当我说使用 TreeSet 时,我的意思是一般情况下,而不是用于此基准测试。 TreeSet 处理现实世界中对象不均匀散列的问题。 如果在 HashSet 的同一个 bin 中获取太多对象,则执行的时间复杂度约为 O(n)。

Are you using the -server flag with the jvm? You can't test for performance without it. (You also have to warm up the jvm before doing the test.)

Also, you probably want to use TreeSet<Integer>. HashSet will be slower in the long run.

And which jvm are you using? The newest I hope.

EDIT

When I say use TreeSet, I mean in general, not for this benchmark. TreeSet handles the real world issue of non even hashing of objects. If you get too many objects in the same bin in a HashSet, you will perform about O(n).

怂人 2024-07-30 10:04:09

如果您确实想在集合中存储基本类型并对其进行大量工作,请在 Java 中推出您自己的集合。 通用类对于科学计算来说速度不够快。

正如 Ants Aasma 提到的,Python 绕过哈希并直接使用整数。 Java 创建一个 Integer 对象(自动装箱) ,然后将其转换为对象(在您的实现中)。 该对象也必须经过哈希处理才能在哈希集中使用。

要进行有趣的比较,请尝试以下操作:

Java

import java.util.HashSet;
class SpeedTest
{ 
  public static class Element {
    private int m_i;
    public Element(int i) {
      m_i = i;
    }
  }

  public static void main(String[] args)
  {        
    long startTime;
    long totalTime;
    int iterations = 1000000;
    HashSet<Element> counts = new HashSet<Element>((int)(2*iterations), 0.75f);

    startTime = System.currentTimeMillis();
    for(int i=0; i<iterations; ++i)
    {
      counts.add(new Element(i));
    }
    totalTime = System.currentTimeMillis() - startTime;
    System.out.println("TOTAL TIME = "+( totalTime/1000f) );
    System.out.println(counts.size());
  }
}

结果:

$java SpeedTest
TOTAL TIME = 3.028
1000000

$java -Xmx1G -Xms1G SpeedTest
TOTAL TIME = 0.578
1000000

Python

#!/usr/bin/python
import time
import sys

class Element(object):
  def __init__(self, i):
    self.num = i

def main(args):    
    iterations = 1000000
    counts = set()
    startTime = time.time();    
    for i in range(0, iterations):
        counts.add(Element(i))
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)


if __name__ == "__main__":
  main(sys.argv)

结果:

$./speedtest.py 
total time = 20.6943161488
1000000

“python 比 java 更快”怎么样?

If you really want to store primitive types in a set, and do heavy work on it, roll your own set in Java. The generic classes are not fast enough for scientific computing.

As Ants Aasma mentions, Python bypasses hashing and uses the integer directly. Java creates an Integer object (autoboxing), and then casts it to an Object (in your implementation). This object must be hashed, as well, for use in a hash set.

For a fun comparision, try this:

Java

import java.util.HashSet;
class SpeedTest
{ 
  public static class Element {
    private int m_i;
    public Element(int i) {
      m_i = i;
    }
  }

  public static void main(String[] args)
  {        
    long startTime;
    long totalTime;
    int iterations = 1000000;
    HashSet<Element> counts = new HashSet<Element>((int)(2*iterations), 0.75f);

    startTime = System.currentTimeMillis();
    for(int i=0; i<iterations; ++i)
    {
      counts.add(new Element(i));
    }
    totalTime = System.currentTimeMillis() - startTime;
    System.out.println("TOTAL TIME = "+( totalTime/1000f) );
    System.out.println(counts.size());
  }
}

Results:

$java SpeedTest
TOTAL TIME = 3.028
1000000

$java -Xmx1G -Xms1G SpeedTest
TOTAL TIME = 0.578
1000000

Python

#!/usr/bin/python
import time
import sys

class Element(object):
  def __init__(self, i):
    self.num = i

def main(args):    
    iterations = 1000000
    counts = set()
    startTime = time.time();    
    for i in range(0, iterations):
        counts.add(Element(i))
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)


if __name__ == "__main__":
  main(sys.argv)

Results:

$./speedtest.py 
total time = 20.6943161488
1000000

How's that for 'python is faster than java'?

皓月长歌 2024-07-30 10:04:09

你用多少内存启动 JVM? 这取决于? 当我使用 1 Gig RAM 的程序运行 JVM 时:

$ java -Xmx1024M -Xms1024M -classpath . SpeedTest 
TOTAL TIME = 5.682
10000000
$ python speedtest.py 
total time = 4.48310899734
10000000

如果我使用较少的内存运行 JVM,则需要更长的时间……相当长的时间:

$ java -Xmx768M -Xms768M -classpath . SpeedTest 
TOTAL TIME = 6.706
10000000
$ java -Xmx600M -Xms600M -classpath . SpeedTest 
TOTAL TIME = 14.086
10000000

我认为 HashSet 是此特定中的性能瓶颈实例。 如果我用 LinkedList 替换 HashSet,程序会变得更快。

最后——请注意,Java 程序最初是被解释的,只有那些被多次调用的方法才会被编译。 因此,您可能将 Python 与 Java 的解释器进行比较,而不是编译器。

How much memory did you start the JVM with? It depends? When I run the JVM with your program with 1 Gig of RAM:

$ java -Xmx1024M -Xms1024M -classpath . SpeedTest 
TOTAL TIME = 5.682
10000000
$ python speedtest.py 
total time = 4.48310899734
10000000

If I run the JVM with less memory, it takes longer ... considerably longer:

$ java -Xmx768M -Xms768M -classpath . SpeedTest 
TOTAL TIME = 6.706
10000000
$ java -Xmx600M -Xms600M -classpath . SpeedTest 
TOTAL TIME = 14.086
10000000

I think the HashSet is the performance bottleneck in this particular instance. If I replace the HashSet with a LinkedList, the program gets substantially faster.

Finally -- note that Java programs are initially interpreted and only those methods that are called many times are compiled. Thus, you're probably comparing Python to Java's interpreter, not the compiler.

梦一生花开无言 2024-07-30 10:04:09

这里只是一个摸索,但 Python 正在做一些 Java 可能没有做的优化:

  • Python 中的 range() 调用在优化的 C 代码中一次性创建所有 10000000 个整数对象。 Java每次迭代都必须创建一个Integer对象,这可能会比较慢。
  • 在Python中,整数是不可变的,因此您可以只存储对全局“42”的引用,而不是为对象分配一个槽。 我不确定 Java 装箱 Integer 对象如何比较。
  • 许多内置的 Python 算法和数据结构都针对特殊情况进行了大量优化。 例如,整数的哈希函数就是恒等函数。 如果 Java 使用更“聪明”的哈希函数,这可能会大大减慢速度。 如果您的大部分时间都花在数据结构代码上,那么考虑到多年来手动调整 Python C 实现所花费的大量精力,看到 Python 击败 Java 我一点也不会感到惊讶。

Just a stab in the dark here, but some optimizations that Python is making that Java probably isn't:

  • The range() call in Python is creating all 10000000 integer objects at once, in optimized C code. Java must create an Integer object each iteration, which may be slower.
  • In Python, ints are immutable, so you can just store a reference to a global "42", for example, rather than allocating a slot for the object. I'm not sure how Java boxed Integer objects compare.
  • Many of the built-in Python algorithms and data structures are rather heavily optimized for special cases. For instance, the hash function for integers is, simply the identity function. If Java is using a more "clever" hash function, this could slow things down quite a bit. If most of your time is spent in data structure code, I wouldn't be surprised at all to see Python beat Java given the amount of effort that has been spent over the years hand-tuning the Python C implementation.
我三岁 2024-07-30 10:04:09

为了更快的 Python 进行了一些更改。

#!/usr/bin/python
import time
import sys

import psyco                 #<<<<  
psyco.full()

class Element(object):
    __slots__=["num"]        #<<<<
    def __init__(self, i):
        self.num = i

def main(args):    
    iterations = 1000000
    counts = set()
    startTime = time.time();
    for i in xrange(0, iterations):
        counts.add(Element(i))
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
  main(sys.argv)

之前

(env)~$ python speedTest.py
total time = 8.82906794548
1000000

之后

(env)~$ python speedTest.py
total time = 2.44039201736
1000000

现在 一些很好的老作弊和......

#!/usr/bin/python
import time
import sys
import psyco

psyco.full()

class Element(object):
    __slots__=["num"]
    def __init__(self, i):
        self.num = i

def main(args):    
    iterations = 1000000
    counts = set()
    elements = [Element(i) for i in range(0, iterations)]
    startTime = time.time();
    for e in elements:
        counts.add(e)
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
  main(sys.argv)

(env)~$ python speedTest.py
total time = 0.526521921158
1000000

A few changes for faster Python.

#!/usr/bin/python
import time
import sys

import psyco                 #<<<<  
psyco.full()

class Element(object):
    __slots__=["num"]        #<<<<
    def __init__(self, i):
        self.num = i

def main(args):    
    iterations = 1000000
    counts = set()
    startTime = time.time();
    for i in xrange(0, iterations):
        counts.add(Element(i))
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
  main(sys.argv)

Before

(env)~$ python speedTest.py
total time = 8.82906794548
1000000

After

(env)~$ python speedTest.py
total time = 2.44039201736
1000000

Now some good old cheating and ...

#!/usr/bin/python
import time
import sys
import psyco

psyco.full()

class Element(object):
    __slots__=["num"]
    def __init__(self, i):
        self.num = i

def main(args):    
    iterations = 1000000
    counts = set()
    elements = [Element(i) for i in range(0, iterations)]
    startTime = time.time();
    for e in elements:
        counts.add(e)
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
  main(sys.argv)

(env)~$ python speedTest.py
total time = 0.526521921158
1000000
捎一片雪花 2024-07-30 10:04:09

好吧,如果您要调优 Java 程序,那么您不妨也调优 Python 程序。

>>> import timeit
>>> timeit.Timer('x = set()\nfor i in range(10000000):\n    x.add(i)').repeat(3, 1)
[2.1174559593200684, 2.0019571781158447, 1.9973630905151367]
>>> timeit.Timer('x = set()\nfor i in xrange(10000000):\n    x.add(i)').repeat(3, 1)
[1.8742368221282959, 1.8714439868927002, 1.869229793548584]
>>> timeit.Timer('x = set(xrange(10000000))').repeat(3, 1)
[0.74582195281982422, 0.73061800003051758, 0.73396801948547363]

仅使用 xrange 就可以在我的机器上提高大约 8% 的速度。 表达式 set(xrange(10000000)) 构建完全相同的集合,但速度快 2.5 倍(从 1.87 秒缩短到 0.74 秒)。

我喜欢调整 Python 程序使其变得更短。 :) 但 Java 也能做到同样的效果。 众所周知,如果你想要Java中的一组密集的小整数,你就不会使用哈希表。 您使用java.util.BitSet

BitSet bits = new BitSet(iterations);

startTime = System.currentTimeMillis();
bits.set(0, iterations, true);
totalTime = System.currentTimeMillis() - startTime;
System.out.println("TOTAL TIME = "+( totalTime/1000f) );
System.out.println(bits.cardinality());

那应该相当快。 不幸的是我现在没有时间测试它。

Well, if you're going to tune the Java program, you might as well tune the Python program too.

>>> import timeit
>>> timeit.Timer('x = set()\nfor i in range(10000000):\n    x.add(i)').repeat(3, 1)
[2.1174559593200684, 2.0019571781158447, 1.9973630905151367]
>>> timeit.Timer('x = set()\nfor i in xrange(10000000):\n    x.add(i)').repeat(3, 1)
[1.8742368221282959, 1.8714439868927002, 1.869229793548584]
>>> timeit.Timer('x = set(xrange(10000000))').repeat(3, 1)
[0.74582195281982422, 0.73061800003051758, 0.73396801948547363]

Just using xrange makes it about 8% faster on my machine. And the expression set(xrange(10000000)) builds exactly the same set, but 2.5x faster (from 1.87 seconds to 0.74).

I like how tuning a Python program makes it shorter. :) But Java can do the same trick. As everyone knows, if you want a dense set of smallish integers in Java, you don't use a hash table. You use a java.util.BitSet!

BitSet bits = new BitSet(iterations);

startTime = System.currentTimeMillis();
bits.set(0, iterations, true);
totalTime = System.currentTimeMillis() - startTime;
System.out.println("TOTAL TIME = "+( totalTime/1000f) );
System.out.println(bits.cardinality());

That should be fairly quick. Unfortunately I don't have the time to test it right now.

夏末染殇 2024-07-30 10:04:09

我同意甘道夫关于启动时间的看法。 另外,您正在分配一个巨大的 HashSet,它与您的 python 代码完全不相似。 我想如果你把它放在分析器下,就会花很多时间在那里。 此外,在这种大小下插入新元素确实会很慢。 我会按照建议研究 TreeSet 。

I agree with Gandalf about the startup time. Also, you are allocating a huge HashSet which is not at all similar to your python code. I imaging if you put this under a profiler, a good chunk of time would be spent there. Also, inserting new elements is really going to be slow with this size. I would look into TreeSet as suggested.

痴者 2024-07-30 10:04:09

最大的问题可能是给定的代码测量的是 wall time - 您的手表测量的-- 但是应该测量比较代码运行时间的是 处理时间 -- 时间量CPU 花费在执行该特定代码而不是其他任务上。

The biggest issue is probably that what the given code measures is wall time -- what your watch measures -- but what should be measured to compare code runtime is process time -- the amount of time the cpu spend executing that particular code and not other tasks.

夏尔 2024-07-30 10:04:09

只需添加一个简单的额外功能,您就可以使 Java microbenchamrk 变得更快。

    HashSet counts = new HashSet((2*iterations), 0.75f);

变得

    HashSet counts = new HashSet((2*iterations), 0.75f) {
        @Override public boolean add(Object element) { return false; }
    };

简单、更快并得到相同的结果。

You can make the Java microbenchamrk much faster, by adding just a simple little extra.

    HashSet counts = new HashSet((2*iterations), 0.75f);

becomes

    HashSet counts = new HashSet((2*iterations), 0.75f) {
        @Override public boolean add(Object element) { return false; }
    };

Simple, faster and gets the same result.

思慕 2024-07-30 10:04:09

您可能想看看是否可以“启动”JIT 编译器来编译您感兴趣的代码部分,方法可能是预先将其作为函数运行一次,然后短暂休眠。 这可能允许 JVM 将函数编译为本机代码。

You might want to see if you can "prime" the JIT compiler into compiling the section of code you're interested in, by perhaps running it as a function once beforehand and sleeping briefly afterwords. This might allow the JVM to compile the function down to native code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文