是否可以使用 .NET 为 Amazon Elastic MapReduce 编写映射/缩减作业?
是否可以为 Amazon Elastic MapReduce 编写映射/减少作业 (http://aws.amazon.com/elasticmapreduce/ )使用.NET 语言? 我特别想使用 C#。
初步研究表明并非如此。 上述 URL 的营销文字建议您“可以选择 Java、Ruby、Perl、Python、PHP、R 或 C++”,但没有提及 .NET 语言。 此亚马逊线程 (http://developer.amazonwebservices.com/connect/thread.jspa ?messageID=136051 --“支持 C# / F# map/reducers”)明确表示“目前 Amazon Elastic MapReduce 不支持 Mono 平台或 C# 或 F# 等语言。”
上面的情况表明这是不可能的。 不过,我想知道是否有任何解决方法。 例如,我可以修改我帐户的 Elastic MapReduce 机器映像并在其中安装 Mono 吗?
亚马逊常见问题解答“使用 Jar 所需的其他软件”建议的替代方案 (http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/index.html?CHAP_AdvancedTopics.html)和“如何通过 Mapper 或 Reducer 使用其他文件和库”(http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/index.html ?addl_files.html),是让Map/Reduce作业的第一步是在本地实例上安装Mono。 这听起来有点低效,但也许可行?
也许更明智的选择是尝试放弃 Elastic MapReduce 的便利性,并在 EC2 上手动设置我自己的 Hadoop 集群。 然后我假设我可以毫无困难地安装 Mono。
Is it possible to write map/reduce jobs for Amazon Elastic MapReduce (http://aws.amazon.com/elasticmapreduce/) using .NET languages? In particular I would like to use C#.
Preliminary research suggests not. The above URL's marketing text suggests you have a "choice of Java, Ruby, Perl, Python, PHP, R, or C++", without mentioning .NET languages. This Amazon thread (http://developer.amazonwebservices.com/connect/thread.jspa?messageID=136051 -- "Support for C# / F# map/reducers") explicitly says that "currently Amazon Elastic MapReduce does not support Mono platform or languages such as C# or F#."
The above suggests that it can't be done. I'm wondering if there are any workarounds, though. For example, can I modify the Elastic MapReduce machine image for my account, and install Mono on there?
An alternative, suggested by Amazon FAQs "Using Other Software Required by Your Jar" (http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/index.html?CHAP_AdvancedTopics.html) and "How to Use Additional Files and Libraries With the Mapper or Reducer" (http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/index.html?addl_files.html), is to make the first step of the Map/Reduce job be to install Mono on the local instance. That sounds kind of inefficient, but maybe it could work?
Maybe a saner alternative would be to try to forgo the convenience of Elastic MapReduce, and manually set up my own Hadoop cluster on EC2. Then I assume I could install Mono without difficulty.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您应该能够从任何 . NET 语言,包括 C#。
You should be able to use the VB.NET library from any .NET language, including C#.
可能有一个可能的解决方法,使用 Hadoop 流并使用提前编译器将 C# 代码编译为本机代码(检查:http://www.mono-project.com/AOT)。 我猜,二进制文件可以像 C++ 程序一样从 S3 运行。
里德·科普西的答案不正确。 VB.NET 库用于创建工作、启动和创建工作。 停止它们,但与 Hadoop 作业中实际运行的代码无关。
There would probably be a possible work-around using Hadoop streaming and compiling your C# code with an Ahead Of Time compiler into native code (check: http://www.mono-project.com/AOT). The binary could be run from S3 like a C++ program could, I guess.
The answer by Reed Copsey is not correct. The VB.NET library is for creating jobs, starting & stopping them, but is not about the code actually running in the Hadoop jobs.
是的,可以像之前的回答者所建议的那样使用 Bootstrap 操作。
博客帖子 - http://atbrox.com/2011/02/07/an-example-of-using-f-and-c-netmono-with-amazons-elastic-mapreduce-hadoop/ - 描述了具有单声道的 C# 映射器和 F# 减速器
Yes, it is possible using the Bootstrap action as previous answerers have suggested.
The blog posting - http://atbrox.com/2011/02/07/an-example-of-using-f-and-c-netmono-with-amazons-elastic-mapreduce-hadoop/ - gives a description of having a C# mapper and a F# reducer with mono
Elastic MapReduce 现在具有“引导操作”功能,亚马逊目前对其解释如下:
(请参阅 http://docs.amazonwebservices.com/ElasticMapReduce/ latest/DeveloperGuide/index.html?introduction.html)
一种建议的用途是在集群计算机上安装软件。 您可以使用它在集群计算机上安装 .NET 运行时环境(可能是 Mono 而不是 Microsoft 的,因为我想象所有 Elastic MapReduce 计算机都运行 Linux)。 (不确定无人值守安装有多困难。有什么想法吗?)完成此操作后,您可以使用 Hadoop 流调用您的 .NET 映射器/化简器,Elastic MapReduce 似乎支持这种方式。
Elastic MapReduce now has a "bootstrap actions" feature, which Amazon currently explains as follows:
(See http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/index.html?introduction.html)
One suggested use of this is to install software on your cluster machines. You could potentially use this to install a .NET runtime environment (probably Mono rather than Microsoft's, because because I imagine all the Elastic MapReduce machines are running Linux) on your cluster machines. (Not sure how hard the unattended install would be. Any ideas?) Having done so, you can call out to your .NET mappers/reducers using Hadoop streaming, which Elastic MapReduce does seem to support.