Java/C++ 的高可用性和可扩展平台在 Solaris 上

发布于 2024-07-04 03:45:32 字数 994 浏览 6 评论 0原文

我有一个在 Solaris 上混合使用 Java 和 C++ 的应用程序。代码的 Java 方面运行 Web UI 并在我们正在交谈的设备上建立状态，而 C++ 代码则对从设备返回的数据进行实时处理。共享内存用于将设备状态和上下文信息从 Java 代码传递到 C++ 代码。 Java 代码使用 PostgreSQL 数据库来保存其状态。

我们遇到了一些相当严重的性能瓶颈，目前我们可以扩展的唯一方法是增加内存和 CPU 数量。由于共享内存设计，我们被困在一个物理盒子上。

这里真正受到重大打击的是 C++ 代码。 Web 界面很少用于配置设备；我们真正困难的是处理设备配置后提供的数据量。

我们从设备返回的每条数据都有一个标识符，它指向设备上下文，我们需要查找它。现在有一系列由 Java/UI 代码维护并由 C++ 代码引用的共享内存对象，这就是瓶颈。由于该架构，我们无法将 C++ 数据处理移至另一台机器。我们需要能够横向扩展，以便不同的机器可以处理不同的设备子集，但是这样我们就失去了进行上下文查找的能力，这就是我要解决的问题：如何卸载真实的设备时间数据处理到其他盒子，同时仍然能够引用设备上下文。

我应该指出，我们无法控制设备本身使用的协议，并且情况不可能改变。

我们知道我们需要摆脱这种情况，以便能够通过向集群添加更多机器来进行扩展，而且我正处于研究如何做到这一点的早期阶段。

现在，我正在将 Terracotta 视为扩展 Java 代码的一种方式，但我还没有弄清楚如何扩展 C++ 来匹配。

除了性能扩展之外，我们还需要考虑高可用性。应用程序需要几乎一直可用——不是绝对 100%，这不符合成本效益，但我们需要合理地避免机器中断。

如果你必须承担我交给的任务，你会怎么做？

编辑：根据 @john channing 提供的数据，我正在查看 GigaSpaces 和 Gemstone。 Oracle Coherence 和 IBM ObjectGrid 似乎仅支持 java。

原文

I have an application that's a mix of Java and C++ on Solaris. The Java aspects of the code run the web UI and establish state on the devices that we're talking to, and the C++ code does the real-time crunching of data coming back from the devices. Shared memory is used to pass device state and context information from the Java code through to the C++ code. The Java code uses a PostgreSQL database to persist its state.

We're running into some pretty severe performance bottlenecks, and right now the only way we can scale is to increase memory and CPU counts. We're stuck on the one physical box due to the shared memory design.

The really big hit here is being taken by the C++ code. The web interface is fairly lightly used to configure the devices; where we're really struggling is to handle the data volumes that the devices deliver once configured.

Every piece of data we get back from the device has an identifier in it which points back to the device context, and we need to look that up. Right now there's a series of shared memory objects that are maintained by the Java/UI code and referred to by the C++ code, and that's the bottleneck. Because of that architecture we cannot move the C++ data handling off to another machine. We need to be able to scale out so that various subsets of devices can be handled by different machines, but then we lose the ability to do that context lookup, and that's the problem I'm trying to resolve: how to offload the real-time data processing to other boxes while still being able to refer to the device context.

I should note we have no control over the protocol used by the devices themselves, and there is no possible chance that situation will change.

We know we need to move away from this to be able to scale out by adding more machines to the cluster, and I'm in the early stages of working out exactly how we'll do this.

Right now I'm looking at Terracotta as a way of scaling out the Java code, but I haven't got as far as working out how to scale out the C++ to match.

As well as scaling for performance we need to consider high availability as well. The application needs to be available pretty much the whole time -- not absolutely 100%, which isn't cost effective, but we need to do a reasonable job of surviving a machine outage.

If you had to undertake the task I've been given, what would you do?

EDIT: Based on the data provided by @john channing, i'm looking at both GigaSpaces and Gemstone. Oracle Coherence and IBM ObjectGrid appear to be java-only.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸢与 2024-07-11 03:45:32

我要做的第一件事是构建系统模型来映射数据流，并尝试准确了解瓶颈所在。如果您可以将系统建模为管道，那么您应该能够使用约束理论（大多数文献是关于优化业务流程的，但它同样适用于软件）来不断提高性能并消除瓶颈。

接下来，我将收集一些准确表征系统性能的硬性经验数据。俗话说，你无法管理无法衡量的东西，但我见过很多人试图根据直觉来优化软件系统，但都惨遭失败。

然后我会使用帕累托原则（80/20规则）来选择少量的东西这将产生最大的收益，并且只关注那些收益。

为了水平扩展 Java 应用程序，我广泛使用了 Oracle Coherence。尽管有些人认为它是一个非常昂贵的分布式哈希表，但它的功能比这要丰富得多，而且你例如，可以从C++代码直接访问缓存中的数据。

水平扩展 Java 代码的其他替代方案是 Giga Spaces、IBM 对象网格或宝石宝石火。

如果您的 C++ 代码是无状态的并且纯粹用于数字运算，您可以考虑使用分发进程ICE Grid 它具有您正在使用的所有语言的绑定。

回复收藏 0 原文

你是年少的欢喜 2024-07-11 03:45:32

你需要横向和向外扩展。也许类似消息队列之类的东西可能是前端和处理之间的后端。

回复收藏 0 原文

夜灵血窟げ 2024-07-11 03:45:32

安德鲁（除了作为管道等进行建模之外），测量事物也很重要。您是否对代码运行了分析器并获取了大部分时间花费在何处的指标？

对于数据库代码，多久更改一次？您现在正在考虑缓存吗？我假设您已经查看了数据上的索引等以加快数据库速度？

您前端的流量是多少？您正在缓存网页吗？（使用 JMS 类型的 api 在组件之间进行通信并不难。然后您可以将 Web Page 组件放在一台机器（或多台）上，然后将集成代码（c++）放在另一台机器上，对于许多 JMS通常会想到原生 C++ api（即 ActiveMQ），但了解有多少时间花在 Web（JSP？）、C++、数据库操作上确实很有帮助。

数据库是存储业务数据，还是也用于在 Java 和 C++ 之间传递数据？你说你使用的是共享内存而不是 JNI ？目前APP中的多线程级别是什么？您会将代码描述为本质上是同步的还是异步的？

Solaris 代码和必须维护的设备之间是否存在物理关系（即所有设备是否都使用 C++ 代码注册，或者是否可以指定）。 IE。如果您要在前端放置一个 Web 负载均衡器，并且今天只放置了 2 台机器，那么哪些设备由预先或提前初始化的盒子管理？

医管局有什么要求？ IE。只是状态信息？ HA 可以仅在 Web 层通过集群会话数据来完成吗？

数据库是否在另一台机器上运行？

数据库有多大？您是否优化了您的查询，即。尝试使用显式内部/外部联接有时比嵌套子查询更有帮助（有时）。（再次查看 sql 统计信息）。

回复收藏 0 原文

~没有更多了~