解析json结构的速度
我想制作一个简单的数据库系统,amd可能使用JSON作为导入和导出(包括完整数据库备份)的主要数据格式。所以我的问题是:与从其他系统(例如(更快的)二进制文件或(更慢的)导入时的速度相比)解析 JSON 的速度有多快,即使是从大 JSON 结构(想想千兆字节) ) XML)?
编辑:澄清一下,我想知道解析 JSON(解析为某种内部数据库格式)有多快,但不知道它作为内部存储机制有多快。 所以这个 JSON 数据不会被查询等,而只是解析成另一种格式。
另外,我问这个问题的主要目的是我想知道 JSON 是否比 XML 更容易解析,因为分隔符更小(“]”或“}”而不是“”或“”),以及它是否在以下方面类似:由于分隔符非常简单,因此可以加快二进制格式的速度。 (例如,也许 json 可以这样解析:记录分隔符 = ascii 代码 xx(xx 是大括号或方括号),除非前面有 ascii xx(xx 是一些转义字符)。)
I want to make a simple database system, amd possibly use JSON as the main data format for importing and exporting (including full database backups) . So my question is: how fast is it to parse JSON, even from big JSON structures (think gigabytes), in comparison to speed when importing from other systems (like (faster) binary files, or (slower) XML)?
EDIT: to clarify, I am wondering how fast it is to parse JSON (into some internal database format), but not how fast it would be as an internal storage mechanism.
So this JSON data would not be queried etc., but just parsed into another format.
Also, my main intent asking this question is I am wondering if JSON is any easier to parse than XML because of smaller delimiters (']' or '}' instead of '' or ''), and if it is maybe even similar in speed to binary formats because of the quite simple delimiters.
(For example, maybe json can be parsed something like this: record delimiter = ascii code xx (xx being a brace or bracket) except where preceded by ascii xx (xx being some escape char).)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
它绝对比 MySQL (对于服务器)或 SQLite(对于客户端)这是首选。
此外,JSON 速度几乎完全取决于实现。例如,您可以
eval()
它,但这不仅非常危险,而且比真正的解析器慢。无论如何,可能有比 JSON 解析器更好的优化 XML 解析器,只是因为它是一种更常用的格式。 (因此,获取 GB 大小的 XML 并想象相同的结果,但速度较慢)。说实话,JSON 从来就不是用来做大事的。如果可能的话,使用真实的数据库。
编辑:为什么 JSON 比数据库慢得多?
原因很多。我会尝试列出一些。
{}
之类的匹配部分(非常类似于 XML 的<>
)这意味着解析器必须检查结束于一个对象块。还有其他一些,例如
[]
和""
。在传统数据库中,没有“结束标签”或“结束括号”,因此更容易阅读。因此,在您可以读取某些 JSON 之前,您必须读取整个文件。这意味着对于您提到的大小最多等待几分钟,并且数据库可以在不到一秒的时间内准备好进行查询(因为层次结构是在开始时存储的)。
在数据库中,大小是为了换取性能。您可以使用
VARCHAR(512)
,所有字符串都将被空填充以占用 512 个字节。为什么?因为这样你就可以知道第四个值位于偏移量 2048 处。您无法使用 JSON 来做到这一点,因此性能会受到影响。...因为它是一种 Web 格式。
这可能看起来像是一个优点,但从性能角度来看却是一个缺点。
因此,某些解析器可能允许出现和考虑不必要的数据,例如注释。例如,Chrome 的原生 JSON 过去允许评论(现在不再允许)。
没有数据库引擎使用
eval()
,对吗?人们可能会将任何内容放入 JSON 文件中,因此解析器是防御性的,有时会尝试读取无效文件。数据库不应该默默地修复损坏的文件。
您可以手动编写 JSON,但不能手动编写数据库!
某些本机解析器(如 IE8)中存在错误,并且对大多数浏览器的支持非常初步,并且比最快的 XML 解析器慢。仅仅是因为 XML 已被使用了很长时间,而且 Steve Ballmer 非常迷恋 XML,所以公司通过让几乎所有的东西都与 XML 兼容来取悦他。 JSON 是 Crockford 成功的周末消遣活动之一。
如果您为自己喜欢的语言随机选择一个开源 JSON 解析器,那么它是世界上最好的解析器的可能性有多大?好吧,对于 XML,您确实有像这样的很棒的解析器,但是对于 JSON 有什么用呢?
需要更多理由说明为什么 JSON 应归入其预期用例吗?
It's definitely much, much slower than MySQL (for a server) or SQLite (for a client) which are preferrable.
Also, JSON speed depends almost solely on the implementation. For instance, you could
eval()
it, but not only that is very risky, it's also slower than a real parser. At any rate, there are probably much better optimized XML parsers than JSON parsers, just because it's a more used format. (So grab a GB-sized XML and imagine the same results but slower).Seriously, JSON was never meant for big things. Use a real database if possible.
Edit: why is JSON much slower than a database?
Many reasons. I'll try to list a few.
{}
s (much like XML's<>
s)This means a parser has to check where's the ending to an object block. There are other of these such as
[]
s and""
s. In a conventional database there's no "ending tag" or "ending bracket" so it's easier to read.So before you can even read some of the JSON you have to read the whole file. This means waiting a few minutes at best for the sizes you mentioned, and a database is ready to be queried in less than a second (because the hierarchy is stored at the beginning).
In a database, size is traded for performance. You can make
VARCHAR(512)
and all strings will be null-padded to occupy 512 bytes. Why? Because that way you can know the 4th value is at offset 2048 for example. You can't do that with JSON hence performance suffers....Because it's a web format.
This may look like a pro but it's a con from a performance perspective.
So some parsers might allow unnecessary data to be present and considered, such as comments. Chrome's native JSON used to allow comments for example (not anymore).
No database engine uses
eval()
right?People might put anything into a JSON file, so parsers are defensive and try to read invalid files sometimes. Database aren't supposed to repair a broken file silently.
You might hand-code a JSON but not a database!
There are bugs in some native parsers (like IE8's) and support for most browsers is very preliminary and slower than, say, the fastest XML parser out there. Simply because XML was being used for ages and Steve Ballmer has an XML fetish so companies please him by making almost anything under the sun XML-compatible. While JSON is one of Crockford's successful weekend pasttimes.
If you pick one random open-source JSON parser for your favourite language, what chances are that it's the best possible parser under the sun? Well, for XML you do have awesome parsers like this But what is there for JSON?
Need more reasons why JSON should be relegated to its intended use case?
如果您将 JSON 视为数据传输的中间格式,您可能还需要考虑二进制替代方案,因为它们需要更少的磁盘空间和网络带宽(压缩和未压缩),因此您可能会获得更快的解析,因为要解析的输入是更短。
如果您运行自己的基准测试,请确保对同一语言的多个解析器进行基准测试,例如,用纯 Python 实现的 JSON 解析器预计会比用纯 Python 实现的 JSON 解析器慢得多用 C 编写的 JSON 解析器——但是您可能会发现同一编程语言的不同实现之间存在显着的速度差异(最多 2 倍,但也可能是 5 倍)。
If you consider JSON as an intermediate format for data transfer, you might want to consider binary alternatives as well, because they need less disk space and network bandwidth (both compressed and uncompressed), so you may get faster parsing because the input to parse is shorter.
If you run your own benchmark, make sure to benchmark multiple parsers for the same language, e.g. a JSON parser implemented in pure Python is expected to be much slower than a JSON parser written in C -- but you may find a significant speed difference (up to a factor of 2, but maybe 5) between different implementations in the same programming language as well.
JSON、XML 和许多其他内容的基准可以在 JVM 序列化器 中找到项目。结果太复杂,无法在此处重现,但最佳 JSON 结果(比较手动类和数据绑定类)比最佳 XML 结果好很多。这种比较并不完整,但它是一个起点。
编辑:截至目前(2012-10-30),还没有公布结果,因为基准正在修订。不过,有一些初步结果可用。
Benchmarks of JSON, XML, and lots of other things can be found in the JVM Serializers project. The results are too complicated to reproduce here, but the best JSON results (comparing both manual and databound classes) are quite a bit better than the best XML results. That comparison isn't complete, but it's a starting point.
EDIT: as of right now (2012-10-30), there are no published results, because the benchmark is being revised. However, there are some preliminary results available.
数据库是一种具有更快查找功能的文件系统。如果您可以使用 JSON 实现相同的目标,那么事情就很容易了。您必须创建一个系统来更快地从 JSON 文件中查找内容。
A database is a file system with faster seeking facility. If you can achieve the same with JSON, then things are easy. You have to make a system to seek things faster from JSON file.