解析json结构的速度

发布于 2024-10-07 23:07:14 字数 451 浏览 2 评论 0原文

我想制作一个简单的数据库系统，amd可能使用JSON作为导入和导出（包括完整数据库备份）的主要数据格式。所以我的问题是：与从其他系统（例如（更快的）二进制文件或（更慢的）导入时的速度相比）解析 JSON 的速度有多快，即使是从大 JSON 结构（想想千兆字节） ) XML)?

编辑：澄清一下，我想知道解析 JSON（解析为某种内部数据库格式）有多快，但不知道它作为内部存储机制有多快。所以这个 JSON 数据不会被查询等，而只是解析成另一种格式。

另外，我问这个问题的主要目的是我想知道 JSON 是否比 XML 更容易解析，因为分隔符更小（“]”或“}”而不是“”或“”），以及它是否在以下方面类似：由于分隔符非常简单，因此可以加快二进制格式的速度。（例如，也许 json 可以这样解析：记录分隔符 = ascii 代码 xx（xx 是大括号或方括号），除非前面有 ascii xx（xx 是一些转义字符）。）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

千と千尋 2024-10-14 23:07:14

它绝对比 MySQL （对于服务器）或 SQLite（对于客户端）这是首选。

此外，JSON 速度几乎完全取决于实现。例如，您可以eval()它，但这不仅非常危险，而且比真正的解析器慢。无论如何，可能有比 JSON 解析器更好的优化 XML 解析器，只是因为它是一种更常用的格式。（因此，获取 GB 大小的 XML 并想象相同的结果，但速度较慢）。

说实话，JSON 从来就不是用来做大事的。如果可能的话，使用真实的数据库。

编辑：为什么 JSON 比数据库慢得多？

原因很多。我会尝试列出一些。

JSON 依赖于诸如 {} 之类的匹配部分（非常类似于 XML 的 <>）

这意味着解析器必须检查结束于一个对象块。还有其他一些，例如 [] 和 ""。在传统数据库中，没有“结束标签”或“结束括号”，因此更容易阅读。

JSON 解析器需要读取每个字符，然后才能理解整个对象结构。

因此，在您可以读取某些 JSON 之前，您必须读取整个文件。这意味着对于您提到的大小最多等待几分钟，并且数据库可以在不到一秒的时间内准备好进行查询（因为层次结构是在开始时存储的）。

在 JSON 中，您无法预先计算偏移量。

在数据库中，大小是为了换取性能。您可以使用 VARCHAR(512)，所有字符串都将被空填充以占用 512 个字节。为什么？因为这样你就可以知道第四个值位于偏移量 2048 处。您无法使用 JSON 来做到这一点，因此性能会受到影响。

JSON 针对小文件大小进行了优化。

...因为它是一种 Web 格式。
这可能看起来像是一个优点，但从性能角度来看却是一个缺点。

JSON 是 JavaScript 子集。

因此，某些解析器可能允许出现和考虑不必要的数据，例如注释。例如，Chrome 的原生 JSON 过去允许评论（现在不再允许）。
没有数据库引擎使用eval()，对吗？

JSON 旨在具有一定的错误恢复能力。

人们可能会将任何内容放入 JSON 文件中，因此解析器是防御性的，有时会尝试读取无效文件。数据库不应该默默地修复损坏的文件。
您可以手动编写 JSON，但不能手动编写数据库！

JSON 是一种新的、不受支持且未经严格测试的格式

某些本机解析器（如 IE8）中存在错误，并且对大多数浏览器的支持非常初步，并且比最快的 XML 解析器慢。仅仅是因为 XML 已被使用了很长时间，而且 Steve Ballmer 非常迷恋 XML，所以公司通过让几乎所有的东西都与 XML 兼容来取悦他。 JSON 是 Crockford 成功的周末消遣活动之一。

最好的 JSON 解析器在浏览器中

如果您为自己喜欢的语言随机选择一个开源 JSON 解析器，那么它是世界上最好的解析器的可能性有多大？好吧，对于 XML，您确实有像这样的很棒的解析器，但是对于 JSON 有什么用呢？

需要更多理由说明为什么 JSON 应归入其预期用例吗？

It's definitely much, much slower than MySQL (for a server) or SQLite (for a client) which are preferrable.

Also, JSON speed depends almost solely on the implementation. For instance, you could eval() it, but not only that is very risky, it's also slower than a real parser. At any rate, there are probably much better optimized XML parsers than JSON parsers, just because it's a more used format. (So grab a GB-sized XML and imagine the same results but slower).

Seriously, JSON was never meant for big things. Use a real database if possible.

Edit: why is JSON much slower than a database?

Many reasons. I'll try to list a few.

JSON relies on matching sections such as {}s (much like XML's <>s)

This means a parser has to check where's the ending to an object block. There are other of these such as []s and ""s. In a conventional database there's no "ending tag" or "ending bracket" so it's easier to read.

JSON parsers need to read each and every character before being able to understand the whole object structure.

So before you can even read some of the JSON you have to read the whole file. This means waiting a few minutes at best for the sizes you mentioned, and a database is ready to be queried in less than a second (because the hierarchy is stored at the beginning).

In JSON you can't precalculate offsets.

In a database, size is traded for performance. You can make VARCHAR(512) and all strings will be null-padded to occupy 512 bytes. Why? Because that way you can know the 4th value is at offset 2048 for example. You can't do that with JSON hence performance suffers.

JSON is optimized for small filesizes.

...Because it's a web format.
This may look like a pro but it's a con from a performance perspective.

JSON is a JavaScript subset.

So some parsers might allow unnecessary data to be present and considered, such as comments. Chrome's native JSON used to allow comments for example (not anymore).
No database engine uses eval() right?

JSON is meant to have some error resilience.

People might put anything into a JSON file, so parsers are defensive and try to read invalid files sometimes. Database aren't supposed to repair a broken file silently.
You might hand-code a JSON but not a database!

JSON is a new, unsupported and badly tested format

There are bugs in some native parsers (like IE8's) and support for most browsers is very preliminary and slower than, say, the fastest XML parser out there. Simply because XML was being used for ages and Steve Ballmer has an XML fetish so companies please him by making almost anything under the sun XML-compatible. While JSON is one of Crockford's successful weekend pasttimes.

The best JSON parsers are in browsers

If you pick one random open-source JSON parser for your favourite language, what chances are that it's the best possible parser under the sun? Well, for XML you do have awesome parsers like this But what is there for JSON?

Need more reasons why JSON should be relegated to its intended use case?

回复收藏 0 原文

_畞蕅 2024-10-14 23:07:14

如果您将 JSON 视为数据传输的中间格式，您可能还需要考虑二进制替代方案，因为它们需要更少的磁盘空间和网络带宽（压缩和未压缩），因此您可能会获得更快的解析，因为要解析的输入是更短。

消息包
BSON （二进制 JSON）
Google 协议缓冲区
Apache & Facebook Thrift
Python Pickle 作为 `cPickle' 模块实现，具有最高版本的
Python Marshal（非常快，但依赖于体系结构和版本）

如果您运行自己的基准测试，请确保对同一语言的多个解析器进行基准测试，例如，用纯 Python 实现的 JSON 解析器预计会比用纯 Python 实现的 JSON 解析器慢得多用 C 编写的 JSON 解析器——但是您可能会发现同一编程语言的不同实现之间存在显着的速度差异（最多 2 倍，但也可能是 5 倍）。