将配置中定义的类的子类作为输出发出时,Hadoop Map 输出 IOException

发布于 2024-12-22 01:03:52 字数 1243 浏览 2 评论 0原文

我有 3 个简单的类:

public abstract class Container implements WritableComparable<Container> {} //empty
public class WeightedEdge extends Container { ... }
public class NodeWeightContainer extends Container { ... }

Map 阶段是这样配置的

JobConf createGraphPConf = new JobConf(new Configuration());
Job job = new Job(createGraphPConf);
...
createGraphPConf.setMapOutputValueClass(Container.class);

但是我收到此错误:

java.io.IOException: Type mismatch in value from map: expected org.hadoop.test.data.util.Container, recieved org.hadoop.test.data.WeightedEdge
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1018)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:591)
at org.hadoop.test.map.CreateGPMap.map(CreateGPMap.java:33)
at org.hadoop.test.map.CreateGPMap.map(CreateGPMap.java:19)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

为什么我无法返回配置中定义的类的子类?有办法解决吗?问题是我的 Map 阶段必须发出两种不同的对象类型。

I have 3 simple classes:

public abstract class Container implements WritableComparable<Container> {} //empty
public class WeightedEdge extends Container { ... }
public class NodeWeightContainer extends Container { ... }

The Map phase was configured as such

JobConf createGraphPConf = new JobConf(new Configuration());
Job job = new Job(createGraphPConf);
...
createGraphPConf.setMapOutputValueClass(Container.class);

However I am receiving this error:

java.io.IOException: Type mismatch in value from map: expected org.hadoop.test.data.util.Container, recieved org.hadoop.test.data.WeightedEdge
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1018)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:591)
at org.hadoop.test.map.CreateGPMap.map(CreateGPMap.java:33)
at org.hadoop.test.map.CreateGPMap.map(CreateGPMap.java:19)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

Why I can't return a subclass of a class that was defined in the configuration? Is there a way around it? The problem is that my Map phase has to emit two distinct object types.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

十二 2024-12-29 01:03:52

您无法返回配置中定义的类的子类,因为 Hadoop 显式检查 setMapOutputValueClass 中指定的类类型及其从映射器接收的类型。

这样做是因为它需要序列化/反序列化从映射器发出的对象。当它执行反序列化时,它会创建在 setMapOutputValueClass 调用中指定类型的新对象,然后使用 WriteableComparable 接口的方法用数据填充新创建的对象。

为了能够发出不同的对象类型,您可以定义容器非抽象类并将实际对象及其类型标识符放入其中


    public enum ELEM_TYPE { WE, WECONTAINER }

    public class Container implements WritableComparable<Container>
    {
        ELEM_TYPE  type; //actual element type -
                         // WeightedEdge  or NodeWeightContainer 
        object value;

        //WritableComparable implementation
        // that casts value to the appropriate type
    } 
    public class WeightedEdge { ... }
    public class NodeWeightContainer { ... }

You can not return a subclass of a class that was defined in the configuration because Hadoop explicitly checks class type specified in setMapOutputValueClass and the type it receives from Mappers.

It does so because it needs to serialize/deserialize objects you emit from mappers. When it performs deserialization it creates new object of type that is specified in setMapOutputValueClass call and then uses methods of WriteableComparable interface to fill newly created object with data.

To be able to emit different object types you may define container non-abstract class and place actual object and its type identifier inside


    public enum ELEM_TYPE { WE, WECONTAINER }

    public class Container implements WritableComparable<Container>
    {
        ELEM_TYPE  type; //actual element type -
                         // WeightedEdge  or NodeWeightContainer 
        object value;

        //WritableComparable implementation
        // that casts value to the appropriate type
    } 
    public class WeightedEdge { ... }
    public class NodeWeightContainer { ... }

北渚 2024-12-29 01:03:52

我今天遇到了同样的问题。有一个Writableorg.apache.hadoop.io.GenericWritable可以用来解决这个问题。您需要扩展该类并实现一个抽象方法:

public class Container extends GenericWritable {

    private static Class[] CLASSES = {
           WeightedEdge.class, 
           NodeWeightContainer.class,
    };

    protected Class[] getTypes() {
       return CLASSES;
    }

}

public class WeightedEdge implemets Writable {...}
public class NodeWeightContainer implements Writable {...}

现在您可以使用Container类作为映射器的输出值类型。

重要提示:您的实际地图输出类(WeightedEdgeNodeWeightContainer)必须实现 Writable 接口。

I faced the same problem today. There is a Writable class org.apache.hadoop.io.GenericWritable which can be used to address this problem. You need to extend the class and implement an abstract method:

public class Container extends GenericWritable {

    private static Class[] CLASSES = {
           WeightedEdge.class, 
           NodeWeightContainer.class,
    };

    protected Class[] getTypes() {
       return CLASSES;
    }

}

public class WeightedEdge implemets Writable {...}
public class NodeWeightContainer implements Writable {...}

Now you can use the class Container as the output value type of your mapper.

Important: Your actual map output classes (WeightedEdge and NodeWeightContainer) must implement the Writable interface.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文