是否有实现随机访问的数据库引擎?

发布于 2024-12-01 02:30:15 字数 674 浏览 0 评论 0原文

通过随机访问我意味着选择随机记录,
随机访问是 能够在相同的时间内获取所有记录,
与从数组中获取值的方式相同。
来自维基百科: http://en.wikipedia.org/wiki/Random_access

我的意图是存储非常大的字符串数组,对于内存来说太大了。
但仍然有随机访问数组的好处。

我通常使用 MySQL,但它似乎只有 B 树和哈希索引类型。

我不明白为什么不可能实现这样的事情。
索引将像数组中一样,从 0 开始,递增 1。

我想简单地通过索引获取字符串,而不是根据字符串获取索引。 目标是提高性能。我也无法控制字符串的顺序 将被访问,它将是一个远程数据库服务器,它将不断接收来自 客户端并返回该索引的字符串。

有解决办法吗?

ps 我不认为这是 Random-access 的重复容器不适合内存?
因为在那个问题中他除了随机访问之外还有其他要求

by Random Access i do not mean selecting a random record,
Random Access is the
ability to fetch all records in equal time,
the same way values are fetched from an array.
From wikipedia: http://en.wikipedia.org/wiki/Random_access

my intention is to store a very large array of strings, one that is too big for memory.
but still have the benefit or random-access to the array.

I usally use MySQL but it seems it has only B-Tree and Hash index types.

I don't see a reason why it isn't possible to implement such a thing.
The indexes will be like in array, starting from zero and incrementing by 1.

I want to simply fetch a string by its index, not get the index according to the string.
The goal is to improve performance. I also cannot control the order in which the strings
will be accessed, it'll be a remote DB server which will constantly receive indexes from
clients and return the string for that index.

Is there a solution for this?

p.s I don't thing this is a duplicate of Random-access container that does not fit in memory?
Because in that question he has other demands except random access

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

遇见了你 2024-12-08 02:30:15

根据您的定义,如果您仅使用 SSD 来存储数据,它将允许您所谓的随机访问(即跨数据集的统一访问速度)。顺序访问比随机访问便宜的事实来自于这样一个事实:对磁盘的顺序访问比随机访问快得多(顺便说一句,任何数据库都会尽力弥补这一点)。

也就是说,即使 RAM 访问也不一致,因为由于 缓存,顺序访问速度更快和NUMA。因此,统一访问无论如何都是一种幻觉,这就引出了一个问题,为什么你一开始就如此坚持拥有它。即,当随机访问速度较慢时,您认为会出现问题 - 对于您的用例来说,它可能仍然足够快。

Given your definition, if you just use an SSD for storing your data, it will allow for what you call random access (i.e. uniform access speed across the data set). The fact that sequential access is less expensive than random one comes from the fact that sequential access to disk is much faster than random one (and any database tries it's best to make up for this, btw).

That said, even RAM access is not uniform as sequential access is faster due to caching and NUMA. So uniform access is an illusion anyway, which begs the question, why you are so insisting of having it in the first place. I.e. what you think will go wrong when having slow random access - it might be still fast enough for your use case.

月下凄凉 2024-12-08 02:30:15

您正在谈论恒定时间,但您提到了唯一的递增主键。

除非这样的键是无间隙的,否则您不能将其用作偏移量,因此您仍然需要某种结构来查找实际的偏移量。

通过偏移量查找记录通常不是特别有用,因为您通常希望通过一些更友好的方法来查找它,这总是涉及索引。搜索 B 树索引的最坏情况是 O(log n),这已经相当不错了。

假设您只有一个字符串数组 - 将其存储在固定长度记录的磁盘文件中,并使用文件系统来查找所需的偏移量。

然后针对数据库查找进行基准测试。

You are talking about constant time, but you mention a unique incrementing primary key.

Unless such a key is gapless, you cannot use it as an offset, so you still need some kind of structure to look up the actual offset.

Finding a record by offset isn't usually particularly useful, since you will usually want to find it by some more friendly method, which will invariably involve an index. Searching a B-Tree index is worst case O(log n), which is pretty good.

Assuming you just have an array of strings - store it in a disk file of fixed length records and use the file system to seek to your desired offset.

Then benchmark against a database lookup.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文