如何将对象传递给Mapper和reducers

发布于 2024-12-11 06:29:25 字数 567 浏览 0 评论 0原文

我有一个在 hadoop 上运行的应用程序。如何将对象传递给映射器和缩减器以处理数据。例如,我声明一个 FieldFilter 对象来过滤映射器中处理的行。过滤器包含许多由用户指定的过滤规则。所以,我想知道如何将过滤器和规则传递给映射器和缩减器? 我的想法是将对象序列化为String,通过configure传递字符串,然后通过字符串重新构造对象。但似乎对我来说不太好!还有其他方法吗? 谢谢!

public class FieldFilter  {      
private final ArrayList<FieldFilterRule> rules = new ArrayList<FieldFilterRule>();

public FieldFilter addRule(FieldFilterRule ... rules) {
    for (int i = 0; i < rules.length; i++) {
        this.rules.add(rules[i]);
        rules[i].setFieldFilter(this);
    }
    return this;
}    }

I have an application run on hadoop. How can I pass the objects to the mappers and reducers so as to process the data. For example, I declare a FieldFilter object for filter the rows processed in the Mappers. The filters contains many filter rules which are specified by users. So, I am wondering how can I pass the filters and rules to the Mappers and Reducers?
My idea is to serialize the objects into String, pass around the string by configure, re-then construct the object by the string. But seems not good for me! any other approaches?
thanks!

public class FieldFilter  {      
private final ArrayList<FieldFilterRule> rules = new ArrayList<FieldFilterRule>();

public FieldFilter addRule(FieldFilterRule ... rules) {
    for (int i = 0; i < rules.length; i++) {
        this.rules.add(rules[i]);
        rules[i].setFieldFilter(this);
    }
    return this;
}    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

冷清清 2024-12-18 06:29:25

您想在 Configuration 中使用 setClass(),如您所见 此处。然后您可以通过 newInstance() 实例化您的类。请记住在 Mapper/Reducer 的 setup() 方法中进行实例化,这样就不会在每次调用 Map/Reduce 方法时都实例化过滤器。祝你好运。

- 编辑。我应该补充一点,您可以通过上下文访问配置,这就是您获取所需类的方式。配置API中有一个getClass()方法。

You want to use the setClass() in the Configuration as you can see here. You can, then instantiate your class by newInstance(). Remember to have the instantiating made in the setup() method of the mapper/reducer, so that you don't instantiate the filter every time map/reduce methods are invoked. Good luck.

--Edit. I should add that you have access to the configuration through context, and that is how you will get the class you need. There is a getClass() method in the configuration api.

本王不退位尔等都是臣 2024-12-18 06:29:25

序列化FieldFilter并将其放入到HDFS中,然后读取它使用 HDFS API 的映射器/减速器函数。如果您有一个大型集群,那么您可能需要增加 复制对于序列化的 FieldFilter 类,因子 默认为 3,因为大量的映射器和读取器任务将读取序列化的 FieldFilter 类。

如果使用新的 MapReduce API,则可以在 Mapper.setup() 函数。这是在映射任务初始化期间调用的。无法找到与旧 MapReduce API 类似的内容。

您还可以考虑使用 DistributedCache< /a> 将序列化的 FieldFilter 类分发到不同的节点。

Serialize FieldFilter and put it in HDFS and later read it in the mapper/reducer functions using the HDFS API. If you have a large cluster, then you might want to increase the replication factor which is defaulted to 3 for the serialized FieldFilter class, since a larger number of mapper and reader tasks would be reading the serialized FieldFilter class.

If new MapReduce API is used then the serialized FieldFilter file can be read in Mapper.setup() function. This is called during the initialization of the map task. Could not find something similar for the old MapReduce API.

You can also consider using DistributedCache to distribute the serialized FieldFilter class to the different nodes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文