实施 RawComparator 真的那么快吗?
实现 RawComparator 比扩展 WritableComparator 快得多吗?查看 Text/LongWritable/etc 及其内置比较器,似乎它们基本上只是直接从完整字节数组中读取字段,而不是使用 DataInput 并将值填充到键类中。
就我而言,我有一个自定义键类,具有多个字段,混合类型,包括一些字符串。尝试使用 RawComparator 来完成它有点让我害怕,因为它看起来(至少在表面上)可能很难正确实现。
Is implementing the RawComparator that much faster than extending WritableComparator? Looking at Text/LongWritable/etc, and their built-in comparators, it seems that they basically just read in the fields directly from the full byte array, instead of having a DataInput be used, and filling in the values into the key class.
In my case, I've got a custom key class, with multiple fields, of mixed types including some Strings. Trying to do it up with RawComparator sorta scares me, since it looks, at least on the surface, as possibly difficult to implement correctly.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,当您 100% 确定字节到字节的比较反映了数据等效性时,原始比较器绝对是好的。
您可以使用 apache 的 Thrift 或 avro 等库来为您处理二进制序列化——在这种情况下,您不必担心原始数据在二进制中编码不一致。
二进制比较总是比对象反序列化更快......但是“这么多”大师?好吧,这取决于你如何定义“那么多”:)
Yes your right that raw comparators are definetly good when you're 100% sure the byte-to-byte comparisons reflect the data equivalence.
You could use a library such as apache's Thrift or avro to handle the binary serialization for you --- in this case, you won't have to worry about your raw data being inconsistently encoded in binary .
Binary comparisons are always faster than object de serialization... But "that much" master? Well that depends on how you define "that much" :)