有没有办法“设置” Hadoop 计数器而不是递增它?
API仅提供在Mapper或Reducer中增加计数器的方法。有办法直接设置吗?或者只增加它的值一次,无论映射器和缩减器运行多少次。
API only provides methods to increase a counter in Mapper or Reducer. Is there a way to just set it? or increment it's value only once irrespective of the number of times mappers and reducers are run.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
你想达到什么目的?这本质上很棘手,如果多个映射器尝试设置计数器怎么办?谁应该获胜?计数器通常仅递增的原因是架构可以非常非常快速且高效地完成此操作。
What are you trying to achieve? This is inherently tricky, as what if multiple mappers try to set the counter? Who should win? The reason counters typically are only incremented is that this can be done very, very quickly and efficiently by the architecture.
您无法设置计数器,因为计数器是从每个任务中求和并聚合到顶级计数器中的。
我在 MapReduce 作业中使用了 ZooKeeper 来进行任务之间的小型通信或协调,或者标记作业中发生的某些事情或任务。
You can't set the counter because the counters are summed from each of the tasks and aggregated into a top-level counter.
I have used ZooKeeper within MapReduce jobs for small communications or coordinations between tasks or flagging certain things that happened in a job or task.
至少正如 @orangeoctupus 所指出的那样,这不能通过 Hadoop API 来完成。
我用于实现此目的的方法是设置作业的上下文属性中的值。最后,可以在作业运行后读取属性。不优雅,但一种解决方法!
This cannot be done from the Hadoop API at least as pointed out by @orangeoctupus as well.
The approach I used for achieve this was to set the value in Job's Context properties. In the end the properties can be read after the job is run. Non-elegant but a workaround!
接口 org.apache.hadoop.mapreduce.Counter 定义了一个方法 setValue,但如果它在全局范围内工作,就像基于描述的那样,我会同意其他答案,即它没有很多用例也是好主意...
The interface org.apache.hadoop.mapreduce.Counter defines a method setValue, but if it works globally like it seems to based upon the description, I would agree with other answers that there aren't many use cases for it that are also good ideas...