如何在数据库中最好地存储大型 JSON 文档(2+ MB)?
在数据库中存储大型 JSON 文件的最佳方法是什么?我了解 CouchDB,但我很确定它不支持我将使用的大小的文件。
我不愿意从磁盘上读取它们,因为读取然后更新它们需要时间。该文件是一个包含约 30,000 个元素的数组,因此我认为当我尝试选择所有元素时,将每个元素单独存储在传统数据库中会杀了我。
What's the best way to store large JSON files in a database? I know about CouchDB, but I'm pretty sure that won't support files of the size I'll be using.
I'm reluctant to just read them off of disk, because of the time required to read and then update them. The file is an array of ~30,000 elements, so I think storing each element separately in a traditional database would kill me when I try to select them all.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我在 CouchDB 中有很多超过 2megs 的文档,它可以很好地处理它们。这些限制已经过时了。
唯一需要注意的是,默认的 javascript 视图服务器有一个相当慢的 JSON 解析器,因此对于大型文档,视图生成可能需要一段时间。您可以将我的 Python 视图服务器 与基于 C 的 JSON 库(jsonlib2、simplejson、yajl)一起使用或者使用内置的 erlang 视图,它甚至不会影响 JSON 序列化,并且视图生成会非常快。
I have lots of documents in CouchDB that exceed 2megs and it handles them fine. Those limits are outdated.
The only caveat is that the default javascript view server has a pretty slow JSON parser so view generation can take a while with large documents. You can use my Python view server with a C based JSON library (jsonlib2, simplejson, yajl) or use the builtin erlang views which don't even hit JSON serialization and view generation will be plenty fast.
如果您打算一次访问一个(或多个)特定元素,则无法将大型 JSON 分解为传统的数据库行和列。
如果您想一次性访问它,您可以将其转换为 XML 并将其存储在数据库中(甚至可以压缩 - XML 具有高度可压缩性)。大多数数据库引擎都支持存储 XML 对象。然后,您可以一次性读取它,如果需要,可以使用 SAX 等前向读取方法或任何其他高效的 XML 读取技术将其转换回 JSON。
但正如 @therefromhere 评论的那样,你总是可以将它保存为一个大字符串(我会再次检查压缩它是否会增强任何东西)。
If you intend to access specific elements one (or several) at a time, there's no way around breaking the big JSON into traditional DB rows and columns.
If you'd like to access it in one shot, you can convert it to XML and store that in the DB (maybe even compressed - XMLs are highly compressible). Most DB engines support storing an XML object. You can then read it in one shot, and if needed, translate back to JSON, using forward-read approaches like SAX, or any other efficient XML-reading technology.
But as @therefromhere commented, you could always save it as one big string (I would again check if compressing it enhances anything).
这里你实际上没有多种选择,你可以使用诸如memcached之类的东西将它们缓存在RAM中,或者将它们推送到磁盘上,并使用databsae(RDBMS,例如PostgreSQL/MySQL或DOD,例如沙发数据库)。唯一真正的替代方案是混合系统,将最常访问的文档缓存在 memcached 中以供阅读,这就是许多网站的运作方式。
2+MB 对于数据库来说并不是什么大问题,只要您有足够的 RAM,它们就会足够智能地进行缓存并有效地使用您的 RAM。您是否有访问这些文档的时间和频率以及您必须为多少用户提供服务的频率模式?
You don't really have a variety of choices here, you can cache them in RAM using something like
memcached
or push them to disk reading and writing them with a databsae (RDBMS like PostgreSQL/MySQL or DOD like CouchDB). The only real alternative to these is a hybrid system of caching the most frequently accessed documents in memcached for reading which is how a lot of sites operate.2+MB isn't a massive deal to a database and providing you have plenty of RAM they will do an intelligent enough job of caching and using your RAM effectively. Do you have a frequency pattern of when and how often these documents are accessed and how man users you have to serve?