构建或购买计算网格平台更好?
我希望对字符串匹配进行一些相当处理器密集型的强力处理。我在多线程环境中运行了我的原型,并将性能与使用 Gridgain 和几个节点(也是多线程)的实现进行了比较。
我观察到的性能是我的 Gridgain 实现比我的多线程实现执行得更慢。我的 gridgain 实现可能存在缺陷,但它只是一个原型,我认为结果具有指示性。所以我的问题是:
当人们可以相当容易地组合一个轻量级计算时,必须学习然后为特定网格平台(hadoop、gridgain 或 EC2,如果托管 - 欢迎其他建议)构建一个实现有什么优势网格平台的学习曲线要浅得多?...也就是说,我们可以通过这些云/网格平台免费获得什么值得拥有/实施起来很棘手的东西?
(请注意,我不需要数据网格)
干杯,
-James
(如果需要的话,很高兴制作这个社区维基)
I am looking to do some quite processor-intensive brute force processing for string matching. I have run my prototype in a multi-threaded environment and compared the performance to an implementation using Gridgain with a couple of nodes (also multithreaded).
The performance I observed was that my Gridgain implementation performed slower to my multithreaded implementation. It could be the case that there was a flaw in my gridgain implementation, but it was only a prototype, and I thought the results were indicative. So my question is this:
What are the advantages of having to learn and then build an implementation for a particular grid platform (hadoop, gridgain, or EC2 if going hosted - other suggestions welcome), when one could fairly easily put together a lightweight compute grid platform with a much shallower learning curve?...i.e. what do we get for free with these cloud/grid platforms that are worth having/tricky to implement?
(Please note, I don't have any need for a data grid)
Cheers,
-James
(p.s. Happy to make this community wiki if needbe)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您正在处理什么样的网格?运行相同操作系统的十几台主机将非常简单地运行网格 - 您真正需要处理的是将工作发送到每个主机,也许需要一点负载平衡,也许要考虑如果主机出现故障该怎么办,当您更新服务时,也许需要处理向主机分发新的服务代码,但如果您不处理其中任何一个,那也没什么大不了的,因为网格的大小是可以管理的。如果您正在处理数千台主机,或者处理一个永远不应该停机的服务,或者由于单个主机停机而出现错误,那么您突然需要担心:
。这是大多数网格软件在需要时应该为您做的事情的简短列表。如果你正在做一些小事或非关键的事情,那么无论如何,自己动手吧。如果您正在开发的东西必须能够工作,或者足够大,以至于在部署过程中进行任何手动步骤都会成为维护噩梦,那么您可能想要使用已经存在的东西。
What kind of grid are you dealing with? A dozen hosts running the same OS would be pretty straightforward to run a grid for - all you really have to deal with is sending work to each host, maybe a little load balancing, maybe take into account what to do if a host goes down, maybe deal with distributing new service code to the hosts when you update your service, but if you don't deal with any of those it's not a big deal since the grid is a manageable size. If you're dealing with 1000s of hosts, or with a service that should never be down or have errors due to single hosts going down then you suddenly have to worry about:
That's a short list of things that most grid software should do for you if you need it. If you're working on something small or non-critical then by all means, roll your own. If you're working on something that has to work, or is big enough that having any manual steps in a deployment process would be a maintenance nightmare then you probably want to go with something that already exists.