将数据从多节点 Cassandra 集群移动到单节点实例
我目前有一个脚本,可以在模式 /var/lib/cassandra/data/fake-keyspace/*-Data.db
的所有文件上调用 bin/sstable2json
并保存从 std out 到磁盘的输出。然而,导出的文件开始占用 /var/lib/cassandra
中所有文件空间的 10 倍,
我在阅读以下部分后采用了这种方法 http://wiki.apache.org/cassandra/Operations#Import_.2BAC8_export
有哪些将数据从一个集群传输到另一个集群的最佳实践有哪些?需要明确的是,我并不是试图向环中添加额外的节点,而是通过可重复的过程将数据从一个环移至另一个环。
任何帮助或推动正确的方向将不胜感激。
I currently have a script that invokes bin/sstable2json
on all files of the pattern /var/lib/cassandra/data/fake-keyspace/*-Data.db
and saves the output from std out to disk. However the exported files are starting to take 10x the space of the all files in /var/lib/cassandra
I took this approach after reading the following section http://wiki.apache.org/cassandra/Operations#Import_.2BAC8_export
What are some of the best practices for get data out from one cluster to another? Just to be clear, I am not trying to add additional nodes to a ring, but rather move data out of one ring to another in a process that is repeatable.
Any help or nudge in the right direction would be much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
只需复制 sstable 文件即可。使用 json 的唯一原因是(1)调试或(2)您想在重新加载之前以 json 形式进行某种处理。
因此,只需将所有 sstable 文件(来自快照,如果您在第一个集群中实时运行)重命名为唯一编号(顺序并不重要,只要它们是唯一的),然后将它们全部复制到数据中目标机器上的目录。
Just copy the sstable files. The only reason to use json is for (1) debugging or (2) you want to do some kind of processing in the json form before re-loading.
So, just rename all the sstable files (from a snapshot, if you're running live in the first cluster) to unique numbers (order doesn't matter, as long as they're unique), and copy them all to the data directory on the target machine.