当前位置：文江博客话题详情

Firefox 扩展和XUL：获取页面源代码

发布于 2024-08-22 20:01:12 字数 64 浏览 9 评论 0原文

我正在开发我的第一个 Firefox 扩展，为此我需要获取当前页面的完整源代码。我怎样才能用 XUL 做到这一点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

叹沉浮 2024-08-29 20:01:12

您将需要一个 xul 浏览器对象来加载内容。

将页面的“view-source:”版本加载到浏览器对象中，与“查看页面源代码”菜单的方式相同。请参阅 chrome://global/content/viewSource.js 中的 viewSource() 函数。该函数可以从缓存加载，也可以不加载。

加载内容后，原始源由以下方式给出：

var source = browser.contentDocument.getElementById('viewsource').textContent;

序列化 DOM 文档
这种方法不会获得原始出处，但可能对某些读者有用。

您可以将文档对象序列化为字符串。请参阅 MDC 中的将 DOM 树序列化为字符串。您可能需要在扩展中使用替代的实例化方法。

该文章讨论了 XML 文档，但它也适用于任何 HTML DOMDocument。

var serializer = new XMLSerializer();
var source = serializer.serializeToString(document);

这甚至可以在网页或 Firebug 控制台中使用。

You will need a xul browser object to load the content into.

Load the "view-source:" version of your page into a the browser object, in the same way as the "View Page Source" menu does. See function viewSource() in chrome://global/content/viewSource.js. That function can load from cache, or not.

Once the content is loaded, the original source is given by:

var source = browser.contentDocument.getElementById('viewsource').textContent;

Serialize a DOM Document
This method will not get the original source, but may be useful to some readers.

You can serialize the document object to a string. See Serializing DOM trees to strings in the MDC. You may need to use the alternate method of instantiation in your extension.

That article talks about XML documents, but it also works on any HTML DOMDocument.

var serializer = new XMLSerializer();
var source = serializer.serializeToString(document);

This even works in a web page or the firebug console.

回复收藏 0 原文

彼岸花似海 2024-08-29 20:01:12

看起来确实没有办法获得“所有源代码”。您可以用来

document.documentElement.innerHTML

获取顶部元素的innerHTML（通常是html）。这样的php错误消息

<h3>fatal error</h3>
segfault

<html>
    <head>
        <title>bla</title>
        <script type="text/javascript">
            alert(document.documentElement.innerHTML);
        </script>
    </head>
    <body>
    </body>
</html>

如果你有像innerHTML

<head>
<title>bla</title></head><body><h3>fatal error</h3>
segfault    
        <script type="text/javascript">
            alert(document.documentElement.innerHTML);
        </script></body>

，但错误消息仍然会保留

edit: documentElement，描述如下：
https://developer.mozilla.org/en/DOM/document.documentElement

really looks like there is no way to get "all the sourcecode". You may use

document.documentElement.innerHTML

to get the innerHTML of the top element (usually html). If you have a php error message like

<h3>fatal error</h3>
segfault

<html>
    <head>
        <title>bla</title>
        <script type="text/javascript">
            alert(document.documentElement.innerHTML);
        </script>
    </head>
    <body>
    </body>
</html>

the innerHTML would be

<head>
<title>bla</title></head><body><h3>fatal error</h3>
segfault    
        <script type="text/javascript">
            alert(document.documentElement.innerHTML);
        </script></body>

but the error message would still retain

edit: documentElement is described here:
https://developer.mozilla.org/en/DOM/document.documentElement

回复收藏 0 原文

逆光飞翔i 2024-08-29 20:01:12

您可以使用 var URL = document.location.href 获取 URL 并导航到 "view-source:"+URL。

现在你可以获取整个源代码（viewsource是正文的id）：

var code = document.getElementById('viewsource').innerHTML;

问题是源代码被格式化了。所以你必须运行 strip_tags() 和 htmlspecialchars_decode() 来修复它。

例如，第 1 行应该是文档类型，第 2 行应该如下所示：

<<span class="start-tag">HTML</span>>

所以在 strip_tags( ）它变成：

<HTML>

在 htmlspecialchars_decode() 之后，我们最终得到了预期的结果：

<HTML>

代码没有传递给 DOM 解析器，这样您也可以查看无效的 HTML。

You can get URL with var URL = document.location.href and navigate to "view-source:"+URL.

Now you can fetch the whole source code (viewsource is the id of the body):

var code = document.getElementById('viewsource').innerHTML;

Problem is that the source code is formatted. So you have to run strip_tags() and htmlspecialchars_decode() to fix it.

For example, line 1 should be the doctype and line 2 should look like:

<<span class="start-tag">HTML</span>>

So after strip_tags() it becomes:

<HTML>

And after htmlspecialchars_decode() we finally get expected result:

<HTML>

The code doesn't pass to DOM parser so you can view invalid HTML too.

回复收藏 0 原文

惜醉颜 2024-08-29 20:01:12

也许你可以通过 DOM 获取它，使用

var source =document.getElementsByTagName("html");

并使用 DOMParser 获取源

https://developer.mozilla.org/En/DOMParser

回复收藏 0 原文

追风人 2024-08-29 20:01:12

Sagi 答案的第一部分，但使用 document.getElementById('viewsource').textContent 代替。

回复收藏 0 原文

傾城如夢未必闌珊 2024-08-29 20:01:12

更符合 Lachlan 的答案，但这里对内部结构进行了相当深入的讨论，进入了 Cpp 代码。

http://www.mail-archive.com/[email protected]/msg05391.html

，然后按照底部的回复进行操作。

回复收藏 0 原文

~没有更多了~

关于作者

丑丑阿

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

Firefox 扩展和XUL：获取页面源代码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

西西弗的石头怪

5397313

烟沫凡尘

一个破名字

萌︼了一个春

当爱已成负担

友情链接

Firefox 扩展和XUL：获取页面源代码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

西西弗的石头怪

5397313

烟沫凡尘

一个破名字

萌︼了一个春

当爱已成负担

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。