win32 平台的 Hadoop/Map-reduce 框架的替代方案
我发现 Windows 上的 Hadoop 有点令人沮丧:我想知道对于 Win32 用户是否有任何真正的 Hadoop 替代品。我最看重的功能是:
- 易于初始设置和操作。在小型网络上部署(如果我们为这个项目分配了超过 20 台工作 PC,我会感到惊讶)
- 易于管理 - 理想的框架应该具有基于 Web/GUI 的管理系统,这样我就不必编写一个我。
- 流行的东西&稳定的。奖金取决于我们能否及时交付该项目。
背景:
我工作的公司想要构建一个新的网格系统来运行一些财务计算。
我评估的第一个框架是 Hadoop。这似乎完全符合预期,只是它非常面向 UNIX。我能够获得所有教程并完成所有教程。在 Ubuntu VirtualBox 上运行。不幸的是,在 Win32 上似乎没有什么可以轻松运行。
是的... Win32:我们公司有一个政策,一切都必须在 Windows 上运行。没有一个服务器管理员(或除了少数开发人员之外的任何人)对 Linux 有任何了解。如果他们发现我的虚拟 Ubuntu 环境,我可能会遇到麻烦!可悲的事实是,我们的网格需要托管在 Win32 上(因为所有测试 PC 都运行 Windows XP 32 位),并且可以选择在将来的某个时候升级到 Win64。
更复杂的是,我们想要运行的 95% 都是带有 C++ Windows 32 位 DLL 附加组件的 Python 脚本。我们的计算库绝大多数是用 Python 编写的。我们的计算库只能在 Windows 上运行...我真的别无选择
I'm finding Hadoop on Windows somewhat frustrating: I want to know if there are any serious alternatives to Hadoop for Win32 users. The features I most value are:
- Ease of initial setup & deployment on a smallish network (I'd be astonished if we ever got more than 20 worker-PCs assigned to this project)
- Ease of management - the ideal framework should have web/GUI based administration system so that I do not have to write one myself.
- Something popular & stable. Bonuses depend on us getting this project delivered in time.
BACKGROUND:
The company I work for wants to build a new grid system to run some financial calculations.
The first framework I have been evaluating is Hadoop. This seemed to do exactly what was intended except that it's very UNIX oriented. I was able to get all of the tutorials up & running on an Ubuntu VirtualBox. Unfortunately nothing seems to run easily on Win32.
Yes... Win32: Our company has a policy that everything has to run on Windows. None of the server admins (or anybody outside of select few developers) know anything about Linux. I'd probably get in trouble if they found my virtual Ubuntu environment! The sad fact is that our grid needs to be hosted on Win32 (since all the test PCs run Windows XP 32bit), with an option to upgrade to Win64 at sometime in the future.
To complicate matters - 95% of what we want to run are Python scripts with C++ Windows 32bit DLL add ons. Our calculation library is overwhelmingly written in Python. Our calculation libraries will not run on anything other than Windows... I do not really have a choice
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
对于Python有:
你可以在 pypi
For python there is:
And you can find a bunch of hadoop clients/integrations on pypi
你可以试试MPI。它是消息传递并发应用程序的标准。我们在 Linux 集群上运行它,但它是跨平台的。最流行的实现是 mpich2,用 C 编写。有 python 绑定MPI 通过 mpi4py 库。
You could try MPI. It is a standard for message-passing concurrent applications. We are running it on our Linux cluster but it is cross-platform. The most popular implementation is mpich2, written in C. There are python bindings for MPI through the mpi4py library.
IPython 具有一些简单且可在 Windows 上运行的并行计算功能。这可能足以满足您的需求。这是一个很好的起点:
http://showmedo.com/videotutorials/video? name=7200100&fromSeriesID=720
IPython has some parallel computing features that are simple and work on windows. It may be enough for your needs. Here's a good place to start:
http://showmedo.com/videotutorials/video?name=7200100&fromSeriesID=720
我已经编译了云中可用的 MapReduce/Hadoop 产品的列表< /a>(托管服务,PaaS 级别),这也可能有帮助。
I've compiled a list of available MapReduce/Hadoop offerings in the cloud (hosted services, PaaS-level), this might be of help as well.
许多分布式计算框架可用于多任务计算。如果您不需要 MapReduce 范例,而是需要跨独立计算机、通信和资源管理分配作业任务的能力,那么您可以看看该领域的其他平台,例如 Condor,甚至 Boinc ;两者都在 Windows 上运行。
您还可以在 Linux 虚拟机上运行 Hadoop。
Many distributed computing frameworks can be used for many-task computing. If you don't need the MapReduce paradigm, but rather the ability to distribute the tasks of a job across separate computers, communication and resource management, then you could take a look at other platforms in this area like Condor, or even Boinc; both run on Windows.
You could also run Hadoop on Linux virtual machines.