同时启动进程比交错运行要慢；为什么？

发布于 2024-08-17 08:36:05 字数 1205 浏览 12 评论 0原文

我正在评估具有 16GB RAM 的 8 核机器上的实验系统设置的性能。我有两个正在运行的主内存 Java RDBMS (hsqldb)，并且针对每个运行一个 TPCC 客户端（源自 jTPCC/BenchmarkSQL）。

我有脚本来启动事物，因此例如 hsqldb 实例的启动方式为：

./hsqld.bash 0 &
./hsqld.bash 1 &

如果我几乎同时启动客户端：

./hsql-tpcc.bash 0 &
./hsql-tpcc.bash 1 &

那么每个客户端的初始速率约为 500-1000 tpmC （这基本上是每分钟的事务数）），然后快速（不到一秒）稳定在 200-250 tpmC 左右的速率。 OTOH，如果我在启动第二个客户端之前等待一两秒：

./hsql-tpcc.bash 0 &
sleep 1
./hsql-tpcc.bash 1 &

那么每个客户端都以 2500+ tpmC 运行。等待超过一秒并没有什么区别。

这很奇怪，因为客户端 0 只与服务器 0 通信，而客户端 1 只与服务器 1 通信。目前还不清楚为什么会出现如此严重的性能干扰。

我认为这可能是由于客户端的 CPU 调度程序亲和力造成的，但在缓慢运行时它们仅占用大约 1-3% 的单个核心（快速运行时为 20-25%）。另一个怀疑是客户端的 NUMA 绑定（同一内存节点上的内存争用），但机器显然只有 1 个内存节点（只有 /sys/devices/system/node/node0），而且每个客户端只占用 0.8%的记忆。

这似乎也不是由于 hsqldb 实例的 CPU 绑定造成的，因为只需重新启动客户端（并等待/不等待一秒钟）即可看到快速和慢速行为，从而使相同的 hsqldb 实例在两者上运行（即hsqldb 不必重新启动）。 hsqldb 慢时占用 4-8% CPU，快时占用 80% CPU，内存 4.3%。

为什么会发生这种情况还有其他想法吗？不涉及磁盘 IO，而且我还没有耗尽系统内存。提前致谢。其他相关信息如下：

$ uname -a
Linux hammer.csail.mit.edu 2.6.27.35-170.2.94.fc10.x86_64 #1 SMP Thu Oct 1 14:41:38 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

原文

I'm evaluating the performance of an experimental system setup on an 8-core machine with 16GB RAM. I have two main-memory Java RDBMSs (hsqldb) running, and against each of these I run a TPCC client (derived from jTPCC/BenchmarkSQL).

I have scripts to launch things, so e.g. the hsqldb instances are started with:

./hsqld.bash 0 &
./hsqld.bash 1 &

If I start the clients at nearly the same time:

./hsql-tpcc.bash 0 &
./hsql-tpcc.bash 1 &

then each of those clients has a spiked initial rate at around 500-1000 tpmC (this is basically transactions per minute), then quickly (in less than a second) settles to a rate of around 200-250 tpmC. OTOH, if I wait for a second or two before starting the second client:

./hsql-tpcc.bash 0 &
sleep 1
./hsql-tpcc.bash 1 &

then each of the clients runs at 2500+ tpmC. Waiting for more than a second doesn't make any more difference.

This is strange because client 0 just talks to server 0 and client 1 just talks to server 1. It's unclear why there's such a dramatic performance interference.

I thought this may be due to CPU scheduler affinity of the clients, but they take only about 1-3% of a single core when running slowly (20-25% when running quickly). Another suspicion was in the clients' NUMA bindings (memory contention on same memory node), but the machine has apparently just 1 memory node (there's only /sys/devices/system/node/node0), and furthermore each client takes just 0.8% of memory.

It also doesn't seem due to CPU bindings for the hsqldb instances, since both fast and slow behaviors can be seen just by restarting the clients (and waiting/not waiting for a second), leaving the same hsqldb instances running across both (i.e. hsqldb doesn't have to be restarted). hsqldb takes 4-8% CPU when slow, 80% CPU when fast, and 4.3% mem.

Any other ideas why this could be happening? There's no disk IO involved, and I'm not close to exhausting the system's memory. Thanks in advance. Other relevant info follows:

$ uname -a
Linux hammer.csail.mit.edu 2.6.27.35-170.2.94.fc10.x86_64 #1 SMP Thu Oct 1 14:41:38 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寻找我们的幸福 2024-08-24 08:36:05

在测试之前，您的“两个主内存 Java RDBMS (hsqldb)”已经运行了多长时间？如果您在测试前开始使用它们，请先尝试将它们预热一下。让热点做它的事情，并完成所有的 if (first_time) { do_initialization(); } 数据库中的代码，以便垃圾收集器可以稳定下来。

此外，同时启动两件事（无论它们是什么）至少意味着两者都试图同时完成所有初始化工作（分配内存、在交换中分配页面、查找和加载库等）。）。因此，两个程序在其生命周期的最初几毫秒都处于 I/O 争用中。

回复收藏 0 原文

~没有更多了~