确定 HTML 源中给定 WebKit 选择中的 DOMRange 的字符索引
我正在尝试同步 DOMRange (代表用户从 Cocoa WebView) 到当前在该视图中呈现的原始 HTML 源,作为一种 Dreamweaver-split-editor:
我的第一个想法是获取 DOMRange 对象的 startContainer
和 offset
并从那里沿着 DOM 树向上走,累积整体字符偏移量直到正文标签。
不幸的是,这个任务存在一些问题:
- 如果 DOM 是通过 Javascript 操作的,或者需要解析器来清理格式错误的标签,那么显然,文档的 externalHTML 将与原始 HTML 源不同。
- 我不知道如何获取节点在其父文本节点内的偏移量(例如,
),以及some
target中的 target 的 4 个字符) text - 尝试解决 #1 中的一些问题,或者只是从 HTML 源转到 WebView 可能需要单独解析 HTML,然后关联两个 DOM 树。
一线希望是 HTML5 指定了一种标准解析算法来处理无效 HTML(WebKit 此后已采用该算法),因此理论上应该可以使用现成的 HTML5 解析器来生成与 WebKit 相同的树 -正确的?
这是我能找到的最相似的现有问题,但它针对的是一个略有不同的问题:
从 Cocoa 中的 WebView 获取源 HTML
I'm attempting to synchronize a DOMRange (representing a user-selection from a Cocoa WebView) to the original HTML source currently rendered in that view, as a kind of Dreamweaver-split-editor:
My first idea was to get the DOMRange object's startContainer
and offset
and walk up the DOM tree from there, accumulating the overall character offset up to the body tag.
Unfortunately this task presents some problems:
- Clearly the document's outerHTML will differ from the original HTML source if the DOM was manipulated via Javascript or the parser needed to clean up malformed tags.
- I can't figure out how to get the offset of a node within its parent text node (e.g., 4 characters to target in
<p>some<div>target</div>text</p>
), and normalize doesn't seem to make this any easier. - Trying to account for some of the problems in #1, or just going from HTML source to WebView will probably require separately parsing the HTML and then correlating the two DOM-trees.
One ray of hope is that HTML5 specifies a standard parsing algorithm for dealing with invalid HTML (which WebKit has since adopted), so in theory it should be possible to use an off-the-shelf HTML5 parser to generate the same tree as WebKit — right?
This is the most similar existing question I could find, but it's for a slightly different problem:
Getting source HTML from a WebView in Cocoa
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你的问题#1实际上并没有那么糟糕;你可以关闭JS解释。
查看 QWebSettings::JavascriptEnabled,或者在加载任何 html 之前将其放入:
QWebSettings::globalSettings()->setAttribute(QWebSettings::JavascriptEnabled, false);
这应该会让你的 DOM 不会被 JS 破坏。祝你好运!
Your problem #1 is actually not so bad; you can just turn off JS interpretation.
Look at
QWebSettings::JavascriptEnabled
, or just drop this in before you load any html:QWebSettings::globalSettings()->setAttribute(QWebSettings::JavascriptEnabled, false);
That should leave your DOM un-mangled by JS. Good luck!