DOM 解析、结构化文档底层遍历

发布于 2024-10-30 00:01:25 字数 453 浏览 7 评论 0原文

作为一名开发人员,我确信我并不孤单,我总是很好奇了解“幕后”是什么。对我来说,DOM 解析器是这种好奇心的榜首之一。我们都知道著名帖子 。出于暂时的需要和好奇,我什至还编写了一些“O RLY?”

然而,我与幕后人物见面的需求仍未得到满足。 DOM 解析器或任何结构化文档解析器如何解析文档?就我的中级 Web 应用程序开发人员的理解而言,它是递归字符串解析和状态保持逻辑的组合,与我自己的黑客尝试不同。

魔术师永远不应该泄露他们的秘密,但说真的,他把兔子藏在哪里了?

As a developer, and I'm certain I'm far from alone here, I'm always curious to understand what's "under the hood". DOM parsers are one of the list-toppers of this curiosity for me. We all know the famous post. I have even hacked together a bit of an "O RLY?", out of both temporary necessity and curiosity.

However my need to meet the man-behind-the-curtain remains unmet. How do DOM parsers, or any structured document parsers for that matter, parse documents? As far as my intermediate web application developer understanding can muster, it's a combination of recursive string parsing and state-keeping logic, not unlike my own hackish attempt.

Magicians should never reveal their secrets, but seriously, where is he hiding the rabbit?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

清风不识月 2024-11-06 00:01:25

有一个完善的解析理论,以及无数的工具来支持它。一般来说,您一次查​​看每个字符,然后决定您迄今为止创建的字符何时构成一个令牌。然后,您查看这一系列标记,并决定标记序列何时构成更高级别的语法结构——在本例中为 HTML 元素。当您识别结构时,您会构建一棵节点树来表示它们——在本例中为 DOM 树。

那么您熟悉上下文无关语法以及编译器-编译器(如 yacc、bison 及其更现代的对应物)吗?如果您理解这些,DOM 解析器就不应该是一个谜。

There's a well-developed theory of parsing, and untold numbers of tools to support it. In general, you look at each character, one at a time, and decide when the characters you've made so far constitute a token. Then you look at the series of tokens, and decide when the sequence of tokens constitute a higher-level grammatical construct -- in this case, an HTML element. As you recognize constructs, you build a tree of nodes to represent them -- in this case, the DOM tree.

So are you familiar with context-free grammars, and compiler-compilers like yacc, bison, and their more modern counterparts? If you understand those, a DOM parser shouldn't be a mystery.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文