如何按值对 LevelDB 进行排序
我使用 leveldb 来存储记录(键值),其中键是 64位散列并且该值是双精度值。打个比方:将 64 位哈希视为客户的唯一 ID,并将其视为帐户余额(即他们的帐户中有多少钱)。我想按帐户余额对数据库进行排序,并首先列出帐户余额最高的客户。但是,数据库无法装入内存,因此我必须使用其他方法对其进行排序,以便按帐户余额进行排序。
我正在考虑使用 STXXL,但它要求我将数据库的副本复制到单个平面文件中,然后我可以使用 STXXL 进行外部排序(这将生成一堆较小的文件,对它们进行排序,然后将它们合并回另一个平面文件)。是否有更好的方法来对数据进行排序,或者我应该使用 STXXL 排序?
I'm using leveldb to store records (key-value), where the key is a 64-bit hash and the value is a double. To make an analogy: think of the 64-bit hash is a unique ID of a customer and the double as an account balance (i.e. how much money they have in their account). I want to sort the database by account balance and list the customers with the highest account balance first. However, the database cannot fit into memory so I have to use some other method for sorting it in order to sort by account balance.
I'm considering using STXXL, but it requires that I make a copy of the database into a single flat file, then I can use STXXL to do an external sort (which would make a bunch of smaller files, sort them and then merge them back into another single flat file). Is there a better approach to sorting the data or should I go with the STXXL sort?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您有多少条目?是否可以使用无符号 32 位整数作为索引(允许 4,294,967,296 个索引)来确定如何对原始数组进行排序?
即创建成对的 32 位索引和帐户余额,对余额进行排序,然后使用 32 位索引计算出原始数据应采用什么顺序?
How many entries do you have? Could an unsigned 32-bit integer be used as an index (would allow 4,294,967,296 indexes) which could be used to identify how to sort the original array?
i.e. create pairs of 32-bit indexes and account balances, sort on the balances then use the 32 bit indexes to work out what order the original data should be in?