win32 平台的 Hadoop/Map-reduce 框架的替代方案

发布于 2024-11-18 11:29:05 字数 760 浏览 4 评论 0原文

我发现 Windows 上的 Hadoop 有点令人沮丧:我想知道对于 Win32 用户是否有任何真正的 Hadoop 替代品。我最看重的功能是:

  • 易于初始设置和操作。在小型网络上部署(如果我们为这个项目分配了超过 20 台工作 PC,我会感到惊讶)
  • 易于管理 - 理想的框架应该具有基于 Web/GUI 的管理系统,这样我就不必编写一个我。
  • 流行的东西&稳定的。奖金取决于我们能否及时交付该项目。

背景:

我工作的公司想要构建一个新的网格系统来运行一些财务计算。

我评估的第一个框架是 Hadoop。这似乎完全符合预期,只是它非常面向 UNIX。我能够获得所有教程并完成所有教程。在 Ubuntu VirtualBox 上运行。不幸的是,在 Win32 上似乎没有什么可以轻松运行。

是的... Win32:我们公司有一个政策,一切都必须在 Windows 上运行。没有一个服务器管理员(或除了少数开发人员之外的任何人)对 Linux 有任何了解。如果他们发现我的虚拟 Ubuntu 环境,我可能会遇到麻烦!可悲的事实是,我们的网格需要托管在 Win32 上(因为所有测试 PC 都运行 Windows XP 32 位),并且可以选择在将来的某个时候升级到 Win64。

更复杂的是,我们想要运行的 95% 都是带有 C++ Windows 32 位 DLL 附加组件的 Python 脚本。我们的计算库绝大多数是用 Python 编写的。我们的计算库只能在 Windows 上运行...我真的别无选择

I'm finding Hadoop on Windows somewhat frustrating: I want to know if there are any serious alternatives to Hadoop for Win32 users. The features I most value are:

  • Ease of initial setup & deployment on a smallish network (I'd be astonished if we ever got more than 20 worker-PCs assigned to this project)
  • Ease of management - the ideal framework should have web/GUI based administration system so that I do not have to write one myself.
  • Something popular & stable. Bonuses depend on us getting this project delivered in time.

BACKGROUND:

The company I work for wants to build a new grid system to run some financial calculations.

The first framework I have been evaluating is Hadoop. This seemed to do exactly what was intended except that it's very UNIX oriented. I was able to get all of the tutorials up & running on an Ubuntu VirtualBox. Unfortunately nothing seems to run easily on Win32.

Yes... Win32: Our company has a policy that everything has to run on Windows. None of the server admins (or anybody outside of select few developers) know anything about Linux. I'd probably get in trouble if they found my virtual Ubuntu environment! The sad fact is that our grid needs to be hosted on Win32 (since all the test PCs run Windows XP 32bit), with an option to upgrade to Win64 at sometime in the future.

To complicate matters - 95% of what we want to run are Python scripts with C++ Windows 32bit DLL add ons. Our calculation library is overwhelmingly written in Python. Our calculation libraries will not run on anything other than Windows... I do not really have a choice

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

阳光①夏 2024-11-25 11:29:05

对于Python有:

  • disco
  • bigtempo
  • celery - 并不是真正的 Map-Reduce 框架,但这是一个好的开始,如果你想要一些非常定制的东西

你可以在 pypi

For python there is:

  • disco
  • bigtempo
  • celery - not really a map-reduce framework, but it's a good start if you want something very customized

And you can find a bunch of hadoop clients/integrations on pypi

伴随着你 2024-11-25 11:29:05

你可以试试MPI。它是消息传递并发应用程序的标准。我们在 Linux 集群上运行它,但它是跨平台的。最流行的实现是 mpich2,用 C 编写。有 python 绑定MPI 通过 mpi4py 库。

You could try MPI. It is a standard for message-passing concurrent applications. We are running it on our Linux cluster but it is cross-platform. The most popular implementation is mpich2, written in C. There are python bindings for MPI through the mpi4py library.

守望孤独 2024-11-25 11:29:05

IPython 具有一些简单且可在 Windows 上运行的并行计算功能。这可能足以满足您的需求。这是一个很好的起点:

http://showmedo.com/videotutorials/video? name=7200100&fromSeriesID=720

IPython has some parallel computing features that are simple and work on windows. It may be enough for your needs. Here's a good place to start:

http://showmedo.com/videotutorials/video?name=7200100&fromSeriesID=720

り繁华旳梦境 2024-11-25 11:29:05

I've compiled a list of available MapReduce/Hadoop offerings in the cloud (hosted services, PaaS-level), this might be of help as well.

眼泪淡了忧伤 2024-11-25 11:29:05

许多分布式计算框架可用于多任务计算。如果您不需要 MapReduce 范例,而是需要跨独立计算机、通信和资源管理分配作业任务的能力,那么您可以看看该领域的其他平台,例如 Condor,甚至 Boinc ;两者都在 Windows 上运行。

您还可以在 Linux 虚拟机上运行 Hadoop。

Many distributed computing frameworks can be used for many-task computing. If you don't need the MapReduce paradigm, but rather the ability to distribute the tasks of a job across separate computers, communication and resource management, then you could take a look at other platforms in this area like Condor, or even Boinc; both run on Windows.

You could also run Hadoop on Linux virtual machines.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文