从 EDU 迁移到 java.util.concurrent 会使性能降低两倍

发布于 2024-10-25 12:40:30 字数 1096 浏览 1 评论 0原文

来自 http://forums.oracle.com/forums/ 的交叉帖子thread.jspa?threadID=2195025&tstart=0

有一个电信应用程序服务器（基于 JAIN SLEE）以及在其中运行的应用程序。
应用程序正在从网络接收消息，对其进行处理并向网络发送回响应。
对于 95% 的调用，请求/响应延迟的要求为 250 毫秒；对于 99.999% 的调用，请求/响应延迟的要求为 3000 毫秒。
我们使用 EDU.oswego.cs.dl.util.concurrent.ConcurrentHashMap，1 个实例。对于一次调用（一次调用是多条消息）处理，将调用以下方法：

"put", "get", "get", "get", then in 180 seconds "remove".

有 4 个线程调用这些方法。
（一个小提示：使用 ConcurrentHashMap 并不是唯一的活动。对于一个网络消息来说，还有很多其他活动：协议消息解析、查询数据库、将 SDR 写入文件、创建短期和长期对象。）

当我们从 EDU.oswego.cs.dl.util.concurrent.ConcurrentHashMap 迁移到 java.util.concurrent.ConcurrentHashMap 时，我们发现性能从每个调用 1400 次下降到 800 次第二。
最后 800 次每秒调用的第一个瓶颈是延迟不足以满足上述要求。

这种性能下降现象会在具有以下 CPU 的主机上重现：

2 个 CPU x 四核 AMD Opteron 2356 2312 MHz，总共 8 个硬件线程，
2 个 CPU x Intel Xeon E5410 2.33 GHz，8 个硬件线程总数。

X5570 CPU（Intel Xeon Nehalem X5570 2.93 GHz，总共 16 个硬件线程）上未重现。

有人遇到过类似的问题吗？如何解决？

原文

Cross post from http://forums.oracle.com/forums/thread.jspa?threadID=2195025&tstart=0

There is a telecom application server (JAIN SLEE based) and the application running in it.

The application is receiving a message from the network, processes it and sends back to the network a response.

The requirement for request/response latency is 250 ms for 95% of calls and 3000 ms for 99.999% of calls.

We use EDU.oswego.cs.dl.util.concurrent.ConcurrentHashMap, 1 instance. For one call (one call is several messages) processing the following methods are invoked:

"put", "get", "get", "get", then in 180 seconds "remove".

There are 4 threads which invoke these methods.

(A small note: working with ConcurrentHashMap is not the only activity. Also for one network message there are a lot of other activities: protocol message parsing, querying a DB, writing an SDR into a file, creating short living and long living objects.)

When we move from EDU.oswego.cs.dl.util.concurrent.ConcurrentHashMap to java.util.concurrent.ConcurrentHashMap, we see a performance degradation from 1400 to 800 calls per second.

The first bottleneck for the last 800 calls per second is not sufficient latency for the requirement above.

This performance degradation is reproduced on hosts with the following CPU:

2 CPU x Quad-Core AMD Opteron 2356
2312 MHz, 8 HW threads in total,
2 CPU x Intel Xeon E5410 2.33 GHz, 8
HW threads in total.

It is not reproduced on X5570 CPU (Intel Xeon Nehalem X5570 2.93 GHz, 16 HW threads in total).

Did anybody face similar issues? How to solve them?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笑红尘 2024-11-01 12:40:31

首先，您是否检查过哈希映射确实是罪魁祸首？假设您这样做了：有一个无锁哈希映射，旨在缩放到数百个处理器，而不会引入大量争用。它的作者是原始 Hot Spot 编译器团队的知名工程师 Cliff Click。现在，致力于将 JDK 扩展到具有数百个 CPU 的机器。所以，我假设他知道他在哈希映射实现中做了什么。有关此哈希映射的更多信息可以在这些幻灯片中找到。

回复收藏 0 原文

作妖 2024-11-01 12:40:31

您是否尝试过更改 ConcurrentHashMap 中的 concurrencyLevel ？尝试一些较低的值，例如 8，尝试一些更大的值。请记住，ConcurrentHashMap 的性能和并发性取决于 HashCode 函数的质量。

是的，它 - java.util.ConcurrentHashMap 与 edu.oswego.cs.dl... 具有相同的起源（来自 edu.oswego 的 Doug Lee），但它完全由他重写，因此可以更好地扩展。

我认为查看 javolution 快速地图可能对您有好处。它可能更适合实时应用程序。

回复收藏 0 原文

不喜欢何必死缠烂打 2024-11-01 12:40:30

我假设你花费的是纳秒而不是毫秒。（这要小一百万倍！）

或者使用 ConcurrentHashMap 是延迟的一小部分。

编辑：已将示例编辑为使用 100 个任务的多线程。

/*
Average operation time for a map of 10,000,000 was 48 ns
Average operation time for a map of 5,000,000 was 51 ns
Average operation time for a map of 2,500,000 was 48 ns
Average operation time for a map of 1,250,000 was 46 ns
Average operation time for a map of 625,000 was 45 ns
Average operation time for a map of 312,500 was 44 ns
Average operation time for a map of 156,200 was 38 ns
Average operation time for a map of 78,100 was 34 ns
Average operation time for a map of 39,000 was 35 ns
Average operation time for a map of 19,500 was 37 ns
 */
 public static void main(String... args) {
    ExecutorService es = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
    try {
        for (int size = 100000; size >= 100; size /= 2)
            test(es, size);
    } finally {
        es.shutdown();
    }
}

private static void test(ExecutorService es, final int size) {
    int tasks = 100;
    final ConcurrentHashMap<Integer, String> map = new ConcurrentHashMap<Integer, String>(tasks*size);
    List<Future> futures = new ArrayList<Future>();
    long start = System.nanoTime();
    for (int j = 0; j < tasks; j++) {
        final int offset = j * size;
        futures.add(es.submit(new Runnable() {
            public void run() {
                for (int i = 0; i < size; i++)
                    map.put(offset + i, "" + i);
                int total = 0;
                for (int j = 0; j < 10; j++)
                    for (int i = 0; i < size; i++)
                        total += map.get(offset + i).length();
                for (int i = 0; i < size; i++)
                    map.remove(offset + i);
            }
        }));
    }
    try {
        for (Future future : futures)
            future.get();
    } catch (Exception e) {
        throw new AssertionError(e);
    }
    long time = System.nanoTime() - start;
    System.out.printf("Average operation time for a map of %,d was %,d ns%n", size * tasks, time / tasks / 12 / size);
}

I assume you are taking about nano-seconds rather than milli-seconds. (That is one million times smaller!)

OR the use of ConcurrentHashMap is a trivial portion of your delay.

EDIT: Have edited the example to be multi-threaded using 100 tasks.

/*
Average operation time for a map of 10,000,000 was 48 ns
Average operation time for a map of 5,000,000 was 51 ns
Average operation time for a map of 2,500,000 was 48 ns
Average operation time for a map of 1,250,000 was 46 ns
Average operation time for a map of 625,000 was 45 ns
Average operation time for a map of 312,500 was 44 ns
Average operation time for a map of 156,200 was 38 ns
Average operation time for a map of 78,100 was 34 ns
Average operation time for a map of 39,000 was 35 ns
Average operation time for a map of 19,500 was 37 ns
 */
 public static void main(String... args) {
    ExecutorService es = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
    try {
        for (int size = 100000; size >= 100; size /= 2)
            test(es, size);
    } finally {
        es.shutdown();
    }
}

private static void test(ExecutorService es, final int size) {
    int tasks = 100;
    final ConcurrentHashMap<Integer, String> map = new ConcurrentHashMap<Integer, String>(tasks*size);
    List<Future> futures = new ArrayList<Future>();
    long start = System.nanoTime();
    for (int j = 0; j < tasks; j++) {
        final int offset = j * size;
        futures.add(es.submit(new Runnable() {
            public void run() {
                for (int i = 0; i < size; i++)
                    map.put(offset + i, "" + i);
                int total = 0;
                for (int j = 0; j < 10; j++)
                    for (int i = 0; i < size; i++)
                        total += map.get(offset + i).length();
                for (int i = 0; i < size; i++)
                    map.remove(offset + i);
            }
        }));
    }
    try {
        for (Future future : futures)
            future.get();
    } catch (Exception e) {
        throw new AssertionError(e);
    }
    long time = System.nanoTime() - start;
    System.out.printf("Average operation time for a map of %,d was %,d ns%n", size * tasks, time / tasks / 12 / size);
}

回复收藏 0 原文

~没有更多了~