类似数组的数据结构在插入和插入方面高效找时间
我需要一个用 Ruby 语言实现的数据结构来容纳大量不同的 url(例如 10**10 段),因此性能是我关心的问题。 我使用 Array
并保持其元素有序,因此我可以执行二分搜索
来查找是否已存在或在何处快速插入 url。 我是这样写的:
class UrlStore
def initialize(*args)
@urls = []
args.each { |e| add(e) unless e.class != String }
end
def add(url)
p = find(url)
@urls.insert(p, url) unless p.class == String
end
def find(url)
l, m, h = 0, 0, @urls.size - 1
while l <= h do
m = l + (h - l) / 2
b = url <=> @urls[m]
if b == 0
return m.to_s
elsif b == 1
l = m + 1
else
h = m - 1
end
end
return l
end
end
如果找到的话,find
方法将返回托管数组中 url 的位置,但以 String
形式返回,以便与发现的那些位置区分开来。插入;否则,返回一个整数 (Fixnum
),告诉 url 应位于何处以保持数组有序。
但请注意,我使用 Array#insert 来在指定位置添加元素。我的直觉告诉我,这种方法会将插入点后的所有元素向后移动一步,这可能会导致严重的性能下降。 Array
模块不在标准库中,它是 Ruby 的固有数据类型,所以我不知道我是否正确。
也许用于托管此类任务的数据结构太天真了。 那么任何人都可以分享一件很棒的事情吗?
I need a data structure, implemented in Ruby
language, to house a huge amount of distinct urls (e.g. 10**10
pieces), so the performance is my concern.
I am using an Array
and keep its elements ordered, so I can perform binary search
to find if one already exists or where to insert a url rapidly.
I wrote this:
class UrlStore
def initialize(*args)
@urls = []
args.each { |e| add(e) unless e.class != String }
end
def add(url)
p = find(url)
@urls.insert(p, url) unless p.class == String
end
def find(url)
l, m, h = 0, 0, @urls.size - 1
while l <= h do
m = l + (h - l) / 2
b = url <=> @urls[m]
if b == 0
return m.to_s
elsif b == 1
l = m + 1
else
h = m - 1
end
end
return l
end
end
The find
method will, if found, return the position of the url in hosting array, but in String
form in order to distinguish from those positions found to be inserted; Otherwise, return an integer(Fixnum
) telling where the url should go to keep the array ordered.
But note that, I use Array#insert
to add an element at a specified position. My intuition tells me that this method will move all elements after insert-point a step backward, which may cause severe performance deduction. The Array
module is not in the standard library, its Ruby's intrinsic data type, so I don't know if I am right.
May be it's so naive a data structure for hosting such a task.
So can any one share an awesome one.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果,正如 Phrogz 建议的那样,您确实设法获得 370GB 内存,或者您意识到实际上不需要存储那么多 URL,那么您可能需要考虑使用 SortedSet。
If, as Phrogz has suggested you do manage to get 370GB of memory, or you realise you don't actually need to store that many URLs, you might want to look into using a SortedSet.
您可能想看看越来越多的开源 NoSQL 解决方案,包括 MongoDB、Cassandra、Kyoto Cabinet 或 Redis。
MongoHQ 为 MongoDB 提供免费托管服务。 RedisToGo 为 Redis 提供免费托管服务。两者都有非常易于使用的 Ruby 绑定。我都用过并且推荐它们。
我听说过有关 Cassandra 和京都橱柜的好消息,但尚未在任何生产应用程序中使用它们。
You might want to look at the growing number of Open source NoSQL solutions including MongoDB, Cassandra, Kyoto Cabinet or Redis.
MongoHQ provides a free hosting service for MongoDB. RedisToGo provides a free hosting service for Redis. Both have very easy to use Ruby bindings. I have used both and recommend them.
I have heard good things about Cassandra and Kyoto Cabinet but have not used them in any production app.