如何将复杂对象存储到hadoop Hbase中?
我有一些复杂的对象,其中包含需要存储到 Hadoop 的集合字段。我不想遍历整个对象树并显式存储每个字段。所以我只是考虑将复杂字段序列化并将其存储为一大块。并且在读取对象时将其反序列化。那么最好的方法是什么?我考虑过为此使用某种序列化,但我希望 Hadoop 有办法处理这种情况。
要存储的示例对象的类:
class ComplexClass {
<simple fields>
List<AnotherComplexClassWithCollectionFields> collection;
}
I have complex objects with collection fields which needed to be stored to Hadoop. I don't want to go through whole object tree and explicitly store each field. So I just think about serialization of complex fields and store it as one big piece. And than desirialize it when reading object. So what is the best way to do it? I though about using some kind serilization for that but I hope that Hadoop has means to handle this situation.
Sample object's class to store:
class ComplexClass {
<simple fields>
List<AnotherComplexClassWithCollectionFields> collection;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
HBase 仅处理字节数组,因此您可以按照您认为合适的任何方式序列化对象。
序列化对象的标准 Hadoop 方法是实现 org.apache.hadoop.io.Writable 接口。然后,您可以使用 org.apache.hadoop.io.WritableUtils.toByteArray(Writable ... writable) 将对象序列化为字节数组。
此外,Hadoop 社区中的人们还使用其他序列化框架,例如 Avro、Protocol Buffers 和 Thrift。所有这些都有其特定的用例,因此请进行研究。如果您正在做一些简单的事情,那么实现 Hadoop 的 Writable 应该就足够了。
HBase only deals with byte arrays, so you can serialize your object in any way you see fit.
The standard Hadoop way of serializing objects is to implement the
org.apache.hadoop.io.Writable
interface. Then you can serialize your object into a byte array usingorg.apache.hadoop.io.WritableUtils.toByteArray(Writable ... writable)
.Also, there are other serialization frameworks that people in the Hadoop community use, like Avro, Protocol Buffers, and Thrift. All have their specific use cases, so do your research. If you're doing something simple, implementing Hadoop's Writable should be good enough.