在低级别对 Hadoop 作业进行基准测试
我必须记录几个基准变量。不幸的是,有些变量需要我在 hadoop 代码中执行测量(map()、reduce()、InputFormat 等)。我想知道什么是“正确”的方法。我可以使用全局变量来存储基准变量并在 Tool.run() 完成之前转储它们,但我认为有更好的方法来做到这一点。有人知道怎么做,或者有什么想法吗?
由于一些限制,更新
基准代码必须嵌入 hadoop 中。我有一个“测试器”应用程序,它运行许多 hadoop 作业并收集基准测试结果。这个想法是在单个“测试器”运行中运行作业并从作业执行中收集基准数据。
I have to record a couple of benchmark variables. Unfortunately some of the variables require me to perform measurement within the hadoop code (map(), reduce(), InputFormat etc.). I was wondering what would be the "right" way to do it. I can use the global variables to store my benchmark variables and dump them just before the Tool.run() finishes, but I think there is a better way of doing this. Does anybody know how to do it, or have any idea?
Update
Benchmark code has to be embedded within hadoop, due to some constraints. I have a "tester" application which runs many hadoop jobs and collects the benchmark results. The idea is to run jobs and collect benchmark data from jobs execution, in a single "tester" run.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
没有什么可以阻止您独立于 MapReduce 对这些方法进行基准测试。 M/R 并不神奇——只是一个 JVM 在服务器上为您运行一些代码。
我们始终针对各个 Map 和 Reduce 函数运行 JUnit 测试。对它们进行分析并没有什么本质上的不同。
Nothing is stopping you from benchmarking those methods independently of MapReduce. M/R isn't magic- just a JVM running some code on the server for you.
We run JUnit tests against individual Map and Reduce functions all the time. Nothing substantially different about profiling them.