在 Amazon MapReduce 上调用已编译的二进制文件

发布于 2025-01-03 06:03:49 字数 513 浏览 8 评论 0原文

我正在尝试在 Amazon Elastic MapReduce 上进行一些数据分析。映射器步骤是一个 python 脚本，其中包含对名为“./formatData”的已编译 C++ 二进制文件的调用。例如：

# myMapper.py
from subprocess import *
inputData = sys.stdin.readline()
# ...
p1 = Popen('./formatData', stdin=PIPE, stdout=PIPE)
p1Output = p1.communicate(input=inputData)
result = ... # manipulate the formatted data
print "%s\t%s" % (result,1)

我可以在 Amazon EMR 上调用这样的二进制可执行文件吗？如果是这样，我将在哪里存储二进制文件（在 S3 中？），我应该在什么平台上编译它，以及如何确保我的映射器脚本可以访问它（理想情况下它将位于当前工作目录中）。

谢谢！

原文

I'm trying to do some data analysis on Amazon Elastic MapReduce. The mapper step is a python script which includes a call to a compiled C++ binary called "./formatData". For example:

# myMapper.py
from subprocess import *
inputData = sys.stdin.readline()
# ...
p1 = Popen('./formatData', stdin=PIPE, stdout=PIPE)
p1Output = p1.communicate(input=inputData)
result = ... # manipulate the formatted data
print "%s\t%s" % (result,1)

Can I call a binary executable like this on Amazon EMR? If so, where would I store the binary (in S3?), for what platform should I compile it, and how I ensure my mapper script has access to it (ideally it would be in the current working directory).

Thanks!

分享到QQ

分享到微博