DBM 数据库的理想字符串长度?
当使用DBM数据库(例如Berkeley或GDBM)时,使用更少的长字符串还是更多的短字符串来存储数据更好? 无论哪种方式我都可以轻松构建我的数据。 我正在寻找性能意义上的“更好”,但我也对其他含义感兴趣。
When using a DBM database (e.g. Berkeley or GDBM), is it better to store data using fewer long strings or more short strings? I can easily structure my data either way. I'm looking for 'better' in the performance sense, but I'm interested in other implications as well.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Berkeley DB 或任何其他 DBM 实现都会产生每个键/值对的开销。 如果您正在处理数百万个 k/v 对,那么开销就很重要,否则就会产生噪音,您应该选择对程序员来说最简单的方式,并让数据库处理数据。 开销和访问时间也取决于访问方法。 哈希表和 B 树是完全不同的算法动物。 如果您的数据具有任何程度的键排序或依赖于键的访问模式,那么 99% 的情况下 B 树都是正确的选择。
我认为你问的是一个很好的设计问题,但我认为任何人要想给你一个完美的答案,我们都必须更多地了解你处理的数据量、访问模式和许多其他因素。
Berkeley DB, or any other DBM implementation, will incur overhead for each key/value pair. If you're dealing with millions of k/v pairs the overhead will matter, otherwise it's noise and you should choose what is easiest for you the programmer and let the database deal with the data. Overhead and access time will also depend on access method. Hash tables and B-Trees are totally different algorithmic animals. If your data has any degree of key ordering or access patterns dependent on keys then 99% of the time B-Trees are the way to go.
I think you're asking a great design question, but I think for anyone to give you a perfect answer we'd all have to know a lot more about the amount of data your dealing with, access patterns, and many other factors.
如果您经常搜索或修改数据,则更多的短字符串将提供更好的性能。
即您不想搜索这些长字符串之一的子字符串,或者频繁修改字符串中间的某些值。
If you will be frequently searching or modifying the data, a greater number of short strings will provide better performance.
i.e. You don't want to be searching for a substring of one of those long strings, or modifying some value in the middle of a string frequently.
我认为这个问题真的很难以完全通用的方式回答。 这里有太多变量,您确实需要测试一些常见场景才能确定最适合您的答案。
需要考虑的一些因素:
最后,通常最好采用产生最规范化模式的方法。 优化可以从那里开始,并且根据您的数据库,可能有比纯粹为了性能而重组底层架构更好的替代方案。
I think this question is really hard to answer in a completely generic way. There are so many variables here, that you would really need to test some common scenarios to determine the answer that is best for you.
Some factors to consider:
In the end, its generally better to go with the approach that yields the most normalized schema. Optimization can start from there, and depending upon your db, there are probably better alternatives than restructuring the underlying schema purely for performance.