Now I haven't used all of these but I've used or investigated the majority of them.
GridGain and GigaSpaces are more centred around grid computing than caching and (imho) best suited to compute grids than data grids (see this explanation of compute vs data grids). I find GigaSpaces to be a really interesting technology and it has several licensing options, including a free version and a free full version for startups.
Coherence and Terracotta try to treat caches as Maps, which is a fairly natural abstraction. I've used Coherence a lot and it's an excellent high-performance product but not cheap. Terracotta I'm less familiar with. The documentation for Coherence I find a bit lacking at times but it really is a powerful product.
OSCache I've primarily used as a means of reducing memory usage and fragmentation in Java Web applications as it has a fairly neat JSP tag. If you've ever looked at compiled JSPs, you'll see they do a lot of String concatenations. This tag allows you to effectively cache the results of a segment of JSP code and HTML into a single String, which can hugely improve performance in some cases.
EHCache is an easy caching solution that I've also used in Web applications. Never as a distributed cache though but it can do that. I tend to view it as a quick and dirty solution but that's perhaps my bias.
memcached is particularly prevelent in the PHP world (and used by such sites as Facebook). It's a really light and easy solution and has the advantage that it doesn't run in the same process and you'll have arguably better interoperability options with other technology stacks, if this is important to you.
You may want to check out Hazelcast also. Hazelcast is an open source transactional, distributed/partitioned implementation of queue, topic, map, set, list, lock and executor service. It is super easy to work with; just add hazelcast.jar into your classpath and start coding. Almost no configuration is required.
If you are interested in executing your Runnable, Callable tasks in a distributed fashion, then please check out Distributed Executor Service documentation at http://code.google.com/docreader/#p=hazelcast
Hazelcast is released under Apache license and enterprise grade support is also available.
I think @cletus's summary is pretty good. I did want to mention that Terracotta provides a lot more than just a distributed cache in the form of a map. It clusters Java heap and synchronization primitives, turning a concurrent Java program into a distributed Java program. You can do caching with it (including using distributed versions of open source cache libs) or a bunch of other stuff.
For work distribution, there are some extra libs written on top of Terracotta, in particular the tim-pipes (for messages) and tim-masterworker (for Master-Worker style distribution) are great abstractions on top of Terracotta. This library is on the Terracotta Forge:
Another you can add to the list is Appistry CloudIQ. It is a distributed computing environment. It is available as a free download up to 5 machines. It includes load distribution as well as automatic fail over of work in the case of a hardware failure, among other features.
For grid computing, you could also consider Ice Grid or DataSynapse GridServer. These both provide very effective mechanisms for distributing tasks and provide fail over and redundancy.
通过设置集群容器实例,您可以利用 HA 功能和工作负载管理,集群在应用程序级别几乎是透明的。 我说几乎是因为在编写要集群的应用程序时,您必须小心如何管理状态,例如,如果您实现了某种缓存,则需要在每台机器上复制缓存的状态。
一个好的起点是下载 glassfish 并尝试设置集群 glassfish 实例。
希望有帮助。
卡尔
I think your question has been interpreted in different ways, you ask about a library which you can use to "cluster enable" your application.
While some of the libs named above can help provide specific cluster functionality such as distributed caching, the more conventional way of enabling work load management is through the use of a J2EE container.
By setting up a clustered container instance this allows you to utilise HA features and work load management, clustering is almost transparent at the application level. I say almost because when writing applications that are going to be clustered you have to be careful how you manage state, for example if you implemented some sort of cache you would need to replicate the state of the cache across each machine.
A good starting place would be to download glassfish and try and setup a clustered glassfish instance.
A very late answer -- but it depends partly on the way your application is configured. You might want to run an executable remotely instead of using one of the approaches above.
Apologies for the lack of links -- but until my reps up I can't post more than one. Products in italics should be easy to Google.
If you want to run an executable in a parametric search -- say you want to spin up the same executable with range of options for each instance -- then a traditional batch approach works well. This is a very traditional high performance computing approach that's still in wide use -- suitable infrastructures for handling this at enterprise scale are Platform LSF, DataSynapse GridServer, PBS or as it matures Windows HPC Server. You might also want to take a look at open source products like Globus and Condor. Depending on just how big your app is, you might also look at gLite, which is used for very large scale scientific projects like the LHC.
The trad HPC approach benefits from having your app code isolated from the processes comprising your compute infrastructure, but may take a performance hit, while others may show quicker throughput but be prone to memory leaks and other problems for long-uptime systems.
发布评论
评论(12)
有几种:
现在我还没有使用所有这些,但我已经使用过或者调查了其中的大多数,
GridGain 和 GigaSpaces 更以网格计算为中心,而不是缓存和(恕我直言) )比数据网格更适合计算网格(请参阅此计算与数据的解释网格)。 我发现 GigaSpaces 是一项非常有趣的技术,它有多种许可选项,包括免费版本和面向初创公司的免费完整版本。
Coherence 和 Terracotta 尝试将缓存视为 地图,这是一个相当自然的抽象。 我经常使用 Coherence,它是一款出色的高性能产品,但价格并不便宜。 兵马俑我不太熟悉。 我发现 Coherence 的文档有时有点缺乏,但它确实是一个强大的产品。
我主要将 OSCache 用作减少 Java Web 应用程序中的内存使用和碎片的方法,因为它具有相当简洁的 JSP 标记。 如果您曾经查看过已编译的 JSP,您会发现它们进行了大量的字符串连接。 该标记允许您有效地将一段 JSP 代码和 HTML 的结果缓存到单个字符串中,这在某些情况下可以极大地提高性能。
EHCache 是一个简单的缓存解决方案,我也在 Web 应用程序中使用过。 虽然从来没有作为分布式缓存,但它可以做到这一点。 我倾向于将其视为一种快速而肮脏的解决方案,但这也许是我的偏见。
memcached 在 PHP 领域尤其流行(并被 Facebook 等网站使用)。 这是一个非常轻量且简单的解决方案,其优点是它不在同一进程中运行,并且如果这对您很重要,您将拥有与其他技术堆栈更好的互操作性选项。
There are several:
Now I haven't used all of these but I've used or investigated the majority of them.
GridGain and GigaSpaces are more centred around grid computing than caching and (imho) best suited to compute grids than data grids (see this explanation of compute vs data grids). I find GigaSpaces to be a really interesting technology and it has several licensing options, including a free version and a free full version for startups.
Coherence and Terracotta try to treat caches as Maps, which is a fairly natural abstraction. I've used Coherence a lot and it's an excellent high-performance product but not cheap. Terracotta I'm less familiar with. The documentation for Coherence I find a bit lacking at times but it really is a powerful product.
OSCache I've primarily used as a means of reducing memory usage and fragmentation in Java Web applications as it has a fairly neat JSP tag. If you've ever looked at compiled JSPs, you'll see they do a lot of String concatenations. This tag allows you to effectively cache the results of a segment of JSP code and HTML into a single String, which can hugely improve performance in some cases.
EHCache is an easy caching solution that I've also used in Web applications. Never as a distributed cache though but it can do that. I tend to view it as a quick and dirty solution but that's perhaps my bias.
memcached is particularly prevelent in the PHP world (and used by such sites as Facebook). It's a really light and easy solution and has the advantage that it doesn't run in the same process and you'll have arguably better interoperability options with other technology stacks, if this is important to you.
您可能还想查看 Hazelcast。 Hazelcast 是队列、主题、映射、集合、列表、锁定和执行器的开源事务、分布式/分区实现服务。 它非常容易使用; 只需将 hazelcast.jar 添加到类路径中并开始编码即可。 几乎不需要任何配置。
如果您有兴趣以分布式方式执行 Runnable、Callable 任务,请查看分布式执行器服务文档,网址为 http://code.google.com/docreader/#p=hazelcast
Hazelcast 是在 Apache 许可证下发布,并且还提供企业级支持。
You may want to check out Hazelcast also. Hazelcast is an open source transactional, distributed/partitioned implementation of queue, topic, map, set, list, lock and executor service. It is super easy to work with; just add hazelcast.jar into your classpath and start coding. Almost no configuration is required.
If you are interested in executing your Runnable, Callable tasks in a distributed fashion, then please check out Distributed Executor Service documentation at http://code.google.com/docreader/#p=hazelcast
Hazelcast is released under Apache license and enterprise grade support is also available.
您考虑过Infinispan吗? 它是一个开源数据网格平台,来自 JBoss.org。 有关更多详细信息,我建议您阅读此(旧)宣布该项目的博客文章,以及更多有趣的博客文章,包括 将 Infinispan 与 Hibernate 结合使用 并作为 独立缓存。 最近,红帽企业数据网格< /a>. 有快速“入门”指南和DZone RefCard 也是如此,甚至 YouTube 视频 :)
Have you considered Infinispan? It is an open source data grid platform, from JBoss.org. For more details, I recommend you read this (old) blog post announcing the project, along with more interesting blog posts of note, including one on using Infinispan with Hibernate and as a standalone cache. Even more recently, on Red Hat's Enterprise Data Grid. There is a quick "getting started" guide, and a DZone RefCard too, even a YouTube video :)
我认为@cletus的总结非常好。 我确实想提一下,Terracotta 提供的不仅仅是地图形式的分布式缓存。 它将 Java 堆和同步原语集群化,将并发 Java 程序转变为分布式 Java 程序。 您可以用它进行缓存(包括使用开源缓存库的分布式版本)或一堆其他东西。
对于工作分发,在 Terracotta 之上编写了一些额外的库,特别是 tim-pipes (用于消息)和 tim-masterworker (用于 Master-Worker 样式分发)是 Terracotta 之上的出色抽象。 该库位于 Terracotta Forge 上:
与其他一些潜在的数据技术相比,最近添加的页面可能会添加一些附加信息:
I think @cletus's summary is pretty good. I did want to mention that Terracotta provides a lot more than just a distributed cache in the form of a map. It clusters Java heap and synchronization primitives, turning a concurrent Java program into a distributed Java program. You can do caching with it (including using distributed versions of open source cache libs) or a bunch of other stuff.
For work distribution, there are some extra libs written on top of Terracotta, in particular the tim-pipes (for messages) and tim-masterworker (for Master-Worker style distribution) are great abstractions on top of Terracotta. This library is on the Terracotta Forge:
This recently added page may add a bit of additional info in comparison to some other potential data technologies:
JPPF 也不错。
JPPF is also nice.
如果您想深入了解,可以使用 JGroups,它为您提供了以下基本知识:集群java进程。
If you want to go a little lower-level, there is JGroups, which provides you with the very basics of clustering java processes.
还要检查 ProActive
And also check ProActive
您可以添加到列表中的另一个是 Appistry CloudIQ。 它是一个分布式计算环境。 它可以免费下载,最多 5 台计算机。 它包括负载分配以及硬件故障时的自动故障转移等功能。
Another you can add to the list is Appistry CloudIQ. It is a distributed computing environment. It is available as a free download up to 5 machines. It includes load distribution as well as automatic fail over of work in the case of a hardware failure, among other features.
对于网格计算,您还可以考虑 Ice Grid 或 DataSynapse GridServer。 它们都提供了非常有效的任务分配机制,并提供故障转移和冗余。
For grid computing, you could also consider Ice Grid or DataSynapse GridServer. These both provide very effective mechanisms for distributing tasks and provide fail over and redundancy.
我认为您的问题已经以不同的方式解释了,您询问了一个可以用来“启用集群”您的应用程序的库。
虽然上面提到的一些库可以帮助提供特定的集群功能(例如分布式缓存),但启用工作负载管理的更传统方法是使用 J2EE 容器。
通过设置集群容器实例,您可以利用 HA 功能和工作负载管理,集群在应用程序级别几乎是透明的。 我说几乎是因为在编写要集群的应用程序时,您必须小心如何管理状态,例如,如果您实现了某种缓存,则需要在每台机器上复制缓存的状态。
一个好的起点是下载 glassfish 并尝试设置集群 glassfish 实例。
希望有帮助。
卡尔
I think your question has been interpreted in different ways, you ask about a library which you can use to "cluster enable" your application.
While some of the libs named above can help provide specific cluster functionality such as distributed caching, the more conventional way of enabling work load management is through the use of a J2EE container.
By setting up a clustered container instance this allows you to utilise HA features and work load management, clustering is almost transparent at the application level. I say almost because when writing applications that are going to be clustered you have to be careful how you manage state, for example if you implemented some sort of cache you would need to replicate the state of the cache across each machine.
A good starting place would be to download glassfish and try and setup a clustered glassfish instance.
Hope that helps.
Karl
另请检查 Fura
Also check Fura
这是一个很晚的答案——但这部分取决于应用程序的配置方式。 您可能希望远程运行可执行文件,而不是使用上述方法之一。
对缺少链接表示歉意 - 但在我的代表上任之前,我不能发布多个链接。 斜体字的产品应该很容易被 Google 搜索到。
如果您想在参数搜索中运行可执行文件(假设您想为每个实例启动具有一系列选项的相同可执行文件),那么传统的批处理方法就可以工作出色地。 这是一种非常传统的高性能计算方法,目前仍在广泛使用 - 用于在企业规模处理此问题的合适基础架构有 Platform LSF、DataSynapse GridServer、PBS< /em> 或随着 Windows HPC Server 的成熟而发展。 您可能还想看看 Globus 和 Condor 等开源产品。 根据您的应用程序有多大,您还可以考虑gLite,它用于大型科学项目,例如大型强子对撞机。
传统的 HPC 方法受益于将应用程序代码与构成计算基础设施的进程隔离,但可能会影响性能,而其他方法可能会显示出更快的吞吐量,但容易出现内存泄漏和长时间正常运行系统的其他问题。
A very late answer -- but it depends partly on the way your application is configured. You might want to run an executable remotely instead of using one of the approaches above.
Apologies for the lack of links -- but until my reps up I can't post more than one. Products in italics should be easy to Google.
If you want to run an executable in a parametric search -- say you want to spin up the same executable with range of options for each instance -- then a traditional batch approach works well. This is a very traditional high performance computing approach that's still in wide use -- suitable infrastructures for handling this at enterprise scale are Platform LSF, DataSynapse GridServer, PBS or as it matures Windows HPC Server. You might also want to take a look at open source products like Globus and Condor. Depending on just how big your app is, you might also look at gLite, which is used for very large scale scientific projects like the LHC.
The trad HPC approach benefits from having your app code isolated from the processes comprising your compute infrastructure, but may take a performance hit, while others may show quicker throughput but be prone to memory leaks and other problems for long-uptime systems.