Firefox 扩展和XUL:获取页面源代码
我正在开发我的第一个 Firefox 扩展,为此我需要获取当前页面的完整源代码。我怎样才能用 XUL 做到这一点?
I am developing my first Firefox extension and for that I need to get the complete source code of the current page. How can I do that with XUL?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您将需要一个 xul 浏览器 对象来加载内容。
将页面的“view-source:”版本加载到浏览器对象中,与“查看页面源代码”菜单的方式相同。请参阅 chrome://global/content/viewSource.js 中的 viewSource() 函数。该函数可以从缓存加载,也可以不加载。
加载内容后,原始源由以下方式给出:
序列化 DOM 文档
这种方法不会获得原始出处,但可能对某些读者有用。
您可以将文档对象序列化为字符串。请参阅 MDC 中的将 DOM 树序列化为字符串。您可能需要在扩展中使用替代的实例化方法。
该文章讨论了 XML 文档,但它也适用于任何 HTML DOMDocument。
这甚至可以在网页或 Firebug 控制台中使用。
You will need a xul browser object to load the content into.
Load the "view-source:" version of your page into a the browser object, in the same way as the "View Page Source" menu does. See function viewSource() in
chrome://global/content/viewSource.js
. That function can load from cache, or not.Once the content is loaded, the original source is given by:
Serialize a DOM Document
This method will not get the original source, but may be useful to some readers.
You can serialize the document object to a string. See Serializing DOM trees to strings in the MDC. You may need to use the alternate method of instantiation in your extension.
That article talks about XML documents, but it also works on any HTML DOMDocument.
This even works in a web page or the firebug console.
看起来确实没有办法获得“所有源代码”。您可以用来
获取顶部元素的innerHTML(通常是html)。 这样的php错误消息
如果你有像innerHTML
,但错误消息仍然会保留
edit: documentElement,描述如下:
https://developer.mozilla.org/en/DOM/document.documentElement
really looks like there is no way to get "all the sourcecode". You may use
to get the innerHTML of the top element (usually html). If you have a php error message like
the innerHTML would be
but the error message would still retain
edit: documentElement is described here:
https://developer.mozilla.org/en/DOM/document.documentElement
您可以使用
var URL = document.location.href
获取 URL 并导航到"view-source:"+URL
。现在你可以获取整个源代码(viewsource是正文的id):
问题是源代码被格式化了。所以你必须运行 strip_tags() 和 htmlspecialchars_decode() 来修复它。
例如,第 1 行应该是文档类型,第 2 行应该如下所示:
所以在 strip_tags( )它变成:
在 htmlspecialchars_decode() 之后,我们最终得到了预期的结果:
代码没有传递给 DOM 解析器,这样您也可以查看无效的 HTML。
You can get URL with
var URL = document.location.href
and navigate to"view-source:"+URL
.Now you can fetch the whole source code (viewsource is the id of the body):
Problem is that the source code is formatted. So you have to run strip_tags() and htmlspecialchars_decode() to fix it.
For example, line 1 should be the doctype and line 2 should look like:
So after strip_tags() it becomes:
And after htmlspecialchars_decode() we finally get expected result:
The code doesn't pass to DOM parser so you can view invalid HTML too.
也许你可以通过 DOM 获取它,使用
并使用 DOMParser 获取源
Maybe you can get it via DOM, using
and fetch the source using DOMParser
Sagi 答案的第一部分,但使用
document.getElementById('viewsource').textContent
代替。The first part of Sagi's answer, but use
document.getElementById('viewsource').textContent
instead.更符合 Lachlan 的答案,但这里对内部结构进行了相当深入的讨论,进入了 Cpp 代码。
http://www.mail-archive.com/[email protected]/msg05391.html
,然后按照底部的回复进行操作。
More in line with Lachlan's answer, but there is a discussion of the internals here that gets quite in depth, going into the Cpp code.
http://www.mail-archive.com/[email protected]/msg05391.html
and then follow the replies at the bottom.