MapReduce 作业中随机访问 HBase 表
我有一个映射缩减作业,其中每个映射器都需要多次随机访问另一个 HBase 表。我想知道对于 HBase 表的大量随机访问(同时,由于映射器同时运行)的效率如何。
多谢!
I have a map reduce job, in which each mapper needs random access to another HBase table for many many times. I am wondering how efficient it is for those large number of random access (concurrently, due to the mappers running concurrently) to HBase tables.
Thanks a lot!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
HBase 在随机访问方面是高效的 - 然而,根据映射/归约中的表有多大以及执行该 i/o 的次数,您可能需要考虑替代选项
例如,如果随机/访问表足够小 - 将其加载到每个映射器中的内存中(覆盖设置来执行此操作)。如果随机访问表很大,请考虑运行额外的映射/归约来为其他映射归约做好准备(这样您就可以检查两个表/统一表)
HBase is efficient at Random access - however depending on how large is the table in the map/reduce and how many tims you perform that i/o you may want to consider alternative options
e.g. if the random/access table is small enough - load it into memory in each mapper (override setup to do that). If the random access table is large consider running an additional map/reduce to prepare it for the other map-reduce (so you'd go over both tables/a unified table)