将配置中定义的类的子类作为输出发出时,Hadoop Map 输出 IOException
我有 3 个简单的类:
public abstract class Container implements WritableComparable<Container> {} //empty
public class WeightedEdge extends Container { ... }
public class NodeWeightContainer extends Container { ... }
Map 阶段是这样配置的
JobConf createGraphPConf = new JobConf(new Configuration());
Job job = new Job(createGraphPConf);
...
createGraphPConf.setMapOutputValueClass(Container.class);
但是我收到此错误:
java.io.IOException: Type mismatch in value from map: expected org.hadoop.test.data.util.Container, recieved org.hadoop.test.data.WeightedEdge
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1018)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:591)
at org.hadoop.test.map.CreateGPMap.map(CreateGPMap.java:33)
at org.hadoop.test.map.CreateGPMap.map(CreateGPMap.java:19)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
为什么我无法返回配置中定义的类的子类?有办法解决吗?问题是我的 Map 阶段必须发出两种不同的对象类型。
I have 3 simple classes:
public abstract class Container implements WritableComparable<Container> {} //empty
public class WeightedEdge extends Container { ... }
public class NodeWeightContainer extends Container { ... }
The Map phase was configured as such
JobConf createGraphPConf = new JobConf(new Configuration());
Job job = new Job(createGraphPConf);
...
createGraphPConf.setMapOutputValueClass(Container.class);
However I am receiving this error:
java.io.IOException: Type mismatch in value from map: expected org.hadoop.test.data.util.Container, recieved org.hadoop.test.data.WeightedEdge
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1018)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:591)
at org.hadoop.test.map.CreateGPMap.map(CreateGPMap.java:33)
at org.hadoop.test.map.CreateGPMap.map(CreateGPMap.java:19)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
Why I can't return a subclass of a class that was defined in the configuration? Is there a way around it? The problem is that my Map phase has to emit two distinct object types.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您无法返回配置中定义的类的子类,因为 Hadoop 显式检查 setMapOutputValueClass 中指定的类类型及其从映射器接收的类型。
这样做是因为它需要序列化/反序列化从映射器发出的对象。当它执行反序列化时,它会创建在 setMapOutputValueClass 调用中指定类型的新对象,然后使用 WriteableComparable 接口的方法用数据填充新创建的对象。
为了能够发出不同的对象类型,您可以定义容器非抽象类并将实际对象及其类型标识符放入其中
You can not return a subclass of a class that was defined in the configuration because Hadoop explicitly checks class type specified in
setMapOutputValueClass
and the type it receives from Mappers.It does so because it needs to serialize/deserialize objects you emit from mappers. When it performs deserialization it creates new object of type that is specified in
setMapOutputValueClass
call and then uses methods of WriteableComparable interface to fill newly created object with data.To be able to emit different object types you may define container non-abstract class and place actual object and its type identifier inside
我今天遇到了同样的问题。有一个
Writable
类org.apache.hadoop.io.GenericWritable
可以用来解决这个问题。您需要扩展该类并实现一个抽象方法:现在您可以使用
Container
类作为映射器的输出值类型。重要提示:您的实际地图输出类(
WeightedEdge
和NodeWeightContainer
)必须实现Writable
接口。I faced the same problem today. There is a
Writable
classorg.apache.hadoop.io.GenericWritable
which can be used to address this problem. You need to extend the class and implement an abstract method:Now you can use the class
Container
as the output value type of your mapper.Important: Your actual map output classes (
WeightedEdge
andNodeWeightContainer
) must implement theWritable
interface.