HBase如何保证行级原子性?
考虑到 HBase 将每个列族存储在单独的 HFile 中以及一行可以跨越多个列族这一事实。 HBase 如何确保跨多个列族的行上的放置/删除操作确实是原子的?
Considering the fact that HBase stores each column family in a separate HFile and the fact that a row can span many Column Families. How does HBase ensure that a put/delete operation on a row that spans multiple column families is indeed atomic ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
所有对一行的写入,无论该行中有多少个列族,都会转到一个区域服务器,然后该区域服务器将编辑写入区域 WAL (Hlog),然后同步写入,然后数据同步已添加到 memstore,以便提供服务。然后,一旦内存存储达到其限制,内存存储就会刷新到磁盘。如果区域服务器出现任何问题并且崩溃/死亡/插头被拔掉,则可以运行 WAL 以保持一切一致。有关更多详细信息,请参阅 HBASE-2283 和 Hbase 架构 101。
All writes to the a row, no matter how many column families might be in that row, go to one regionserver, and that regionserver then writes the edit to the regions WAL (Hlog), then the writes are sync'd, then the data is added to the memstore so it will be served. Then - once the memstore has hit its limit - the memstore be flushed to disk. If any problems occur to the regionserver and it crashes/dies/has the plug pulled the WAL can be run through to keep everything consistant. For more gory details see the HBASE-2283 and Hbase Architecture 101.
尽管通过同时刷新所有列族来写入多个 HFile,HBase 目前仍实现了行级原子性。当最大的列族达到配置的刷新大小时,触发刷新。还有一个额外的 MemStore 级时间戳,允许对 MemStore 读取进行多版本并发控制,但写入 HFile 的键/值不存在该时间戳。切换到每列族刷新(提高效率的理想功能)也需要将类似的时间戳添加到文件格式中。
HBase currently achieves row-level atomicity in spite of writing multiple HFiles by flushing all column families at the same time. The flush is triggered when the biggest column family reaches the configured flush size. There is an additional MemStore-level timestamp that allows to do multi-version concurrency control for MemStore reads, but that does not exist for key/values that are written to HFiles. Switching to per-column-family flush (a desirable feature for improving efficiency) would require a similar timestamp to be added to the file format as well.