当前位置：文江博客话题详情

如何以字符串形式获取整个文档 HTML？

发布于 2024-07-18 21:52:49 字数 105 浏览 9 评论 0 原文

JS 有没有办法以字符串的形式获取 html 标签内的整个 HTML？

document.documentElement.??

原文

Is there a way in JS to get the entire HTML within the html tags, as a string?

document.documentElement.??

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

べ繥欢鉨o。 2024-07-25 21:52:50

使用document.documentElement。

同样的问题在这里回答：
https://stackoverflow.com/a/7289396/2164160

回复收藏 0 原文

鸠魁 2024-07-25 21:52:50

正确的方法实际上是：

webBrowser1.DocumentText

回复收藏 0 原文

深陷 2024-07-25 21:52:50

您必须迭代文档 childNodes 并获取外部 HTML 内容。

在VBA中，它看起来像这样

For Each e In document.ChildNodes
    Put ff, , e.outerHTML & vbCrLf
Next e

使用this，允许您获取网页的所有元素，包括< ！文档类型> 节点（如果存在）

You have to iterate through the document childNodes and getting the outerHTML content.

in VBA it looks like this

For Each e In document.ChildNodes
    Put ff, , e.outerHTML & vbCrLf
Next e

using this, allows you to get all elements of the web page including < !DOCTYPE > node if it exists

回复收藏 0 原文

红衣飘飘貌似仙 2024-07-25 21:52:50

我只需要 doctype html 并且应该在 IE11、Edge 和 Chrome 中正常工作。我使用下面的代码它工作正常。

function downloadPage(element, event) {
    var isChrome = /Chrome/.test(navigator.userAgent) && /Google Inc/.test(navigator.vendor);

    if ((navigator.userAgent.indexOf("MSIE") != -1) || (!!document.documentMode == true)) {
        document.execCommand('SaveAs', '1', 'page.html');
        event.preventDefault();
    } else {
        if(isChrome) {
            element.setAttribute('href','data:text/html;charset=UTF-8,'+encodeURIComponent('<!doctype html>' + document.documentElement.outerHTML));
        }
        element.setAttribute('download', 'page.html');
    }
}

并在您的锚标记中像这样使用。

<a href="#" onclick="downloadPage(this,event);" download>Download entire page.</a>

示例

    function downloadPage(element, event) {
    	var isChrome = /Chrome/.test(navigator.userAgent) && /Google Inc/.test(navigator.vendor);
    
    	if ((navigator.userAgent.indexOf("MSIE") != -1) || (!!document.documentMode == true)) {
    		document.execCommand('SaveAs', '1', 'page.html');
    		event.preventDefault();
    	} else {
    		if(isChrome) {
                element.setAttribute('href','data:text/html;charset=UTF-8,'+encodeURIComponent('<!doctype html>' + document.documentElement.outerHTML));
    		}
    		element.setAttribute('download', 'page.html');
    	}
    }

I just need doctype html and should work fine in IE11, Edge and Chrome. 

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

<p>
<a href="#" onclick="downloadPage(this,event);"  download><h2>Download entire page.</h2></a></p>

<p>Some image here</p>

<p><img src="https://placeimg.com/250/150/animals"/></p>

I just need doctype html and should work fine in IE11, Edge and Chrome. I used below code it works fine.

function downloadPage(element, event) {
    var isChrome = /Chrome/.test(navigator.userAgent) && /Google Inc/.test(navigator.vendor);

    if ((navigator.userAgent.indexOf("MSIE") != -1) || (!!document.documentMode == true)) {
        document.execCommand('SaveAs', '1', 'page.html');
        event.preventDefault();
    } else {
        if(isChrome) {
            element.setAttribute('href','data:text/html;charset=UTF-8,'+encodeURIComponent('<!doctype html>' + document.documentElement.outerHTML));
        }
        element.setAttribute('download', 'page.html');
    }
}

and in your anchor tag use like this.

<a href="#" onclick="downloadPage(this,event);" download>Download entire page.</a>

Example

    function downloadPage(element, event) {
    	var isChrome = /Chrome/.test(navigator.userAgent) && /Google Inc/.test(navigator.vendor);
    
    	if ((navigator.userAgent.indexOf("MSIE") != -1) || (!!document.documentMode == true)) {
    		document.execCommand('SaveAs', '1', 'page.html');
    		event.preventDefault();
    	} else {
    		if(isChrome) {
                element.setAttribute('href','data:text/html;charset=UTF-8,'+encodeURIComponent('<!doctype html>' + document.documentElement.outerHTML));
    		}
    		element.setAttribute('download', 'page.html');
    	}
    }

I just need doctype html and should work fine in IE11, Edge and Chrome. 

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

<p>
<a href="#" onclick="downloadPage(this,event);"  download><h2>Download entire page.</h2></a></p>

<p>Some image here</p>

<p><img src="https://placeimg.com/250/150/animals"/></p>

回复收藏 0 原文

会傲 2024-07-25 21:52:49

您可以

new XMLSerializer().serializeToString(document)

在比 IE 9 更新的浏览器中

执行此操作，请参阅 https://caniuse.com/xml-serializer

You can do

new XMLSerializer().serializeToString(document)

in browsers newer than IE 9

See https://caniuse.com/xml-serializer

回复收藏 0 原文

倒带 2024-07-25 21:52:49

我尝试了各种答案以查看返回的内容。我正在使用最新版本的 Chrome。

建议 document.documentElement.innerHTML; 返回 ; ...

Gaby 的建议 document.getElementsByTagName('html')[0].innerHTML; 返回了相同的结果。

建议 document.documentElement.outerHTML; 返回 ...
这是除了“文档类型”之外的所有内容。

您可以使用 document.doctype; 检索 doctype 对象，这将返回一个对象，而不是字符串，因此，如果您需要将所有 doctype 的详细信息提取为字符串（包括 HTML5），则如下所述：使用 Javascript 获取 HTML 的 DocType 作为字符串

I只需要 HTML5，因此以下内容足以让我创建整个文档：

alert('' + '\n' + document.documentElement.outerHTML);

回复收藏 0 原文

怎言笑 2024-07-25 21:52:49

使用元素>document.documentElement 然后获取其 .innerHTML：

const txt = document.documentElement.innerHTML;
alert(txt);

或其 .outerHTML 也可以获取标签

const txt = document.documentElement.outerHTML;
alert(txt);

Get the root <html> element with document.documentElement then get its .innerHTML:

const txt = document.documentElement.innerHTML;
alert(txt);

or its .outerHTML to get the <html> tag as well

const txt = document.documentElement.outerHTML;
alert(txt);

回复收藏 0 原文

我偏爱纯白色 2024-07-25 21:52:49

我相信 document.documentElement.outerHTML 应该为您返回该值。

根据 MDN，outerHTML 是Firefox 11、Chrome 0.2、Internet Explorer 4.0、Opera 7、Safari 1.3、Android、Firefox Mobile 11、IE Mobile、Opera Mobile 和 Safari Mobile 支持。 outerHTML 位于 DOM 解析和序列化规范中。

outerHTML 属性上的 MSDN 页面指出 IE 5+ 支持它。 Colin 的答案链接到 W3C quirksmode 页面，该页面提供了跨浏览器兼容性的良好比较（也适用于其他 DOM 功能）。

回复收藏 0 原文

提赋 2024-07-25 21:52:49

你也可以这样做：

document.getElementsByTagName('html')[0].innerHTML

你不会得到 Doctype 或 html 标签，但其他一切......

You can also do:

document.getElementsByTagName('html')[0].innerHTML

You will not get the Doctype or html tag, but everything else...

回复收藏 0 原文

獨角戲 2024-07-25 21:52:49

document.documentElement.innerHTML

document.documentElement.innerHTML

回复收藏 0 原文

高冷爸爸 2024-07-25 21:52:49

要获取 ... 之外的内容，最重要的是声明，您可以步行通过 document.childNodes，将每个节点转换为字符串：

const html = [...document.childNodes]
    .map(node => nodeToString(node))
    .join('\n') // could use '' instead, but whitespace should not matter.

function nodeToString(node) {
    switch (node.nodeType) {
        case node.ELEMENT_NODE:
            return node.outerHTML
        case node.TEXT_NODE:
            // Text nodes should probably never be encountered, but handling them anyway.
            return node.textContent
        case node.COMMENT_NODE:
            return `<!--${node.textContent}-->`
        case node.DOCUMENT_TYPE_NODE:
            return doctypeToString(node)
        default:
            throw new TypeError(`Unexpected node type: ${node.nodeType}`)
    }
}

我将此代码发布为 document-outerhtml 在 npm 上。

编辑注意上面的代码依赖于函数doctypeToString；其实现如下（下面的代码在 npm 上发布为 doctype-to-string< /a>):

function doctypeToString(doctype) {
    if (doctype === null) {
        return ''
    }
    // Checking with instanceof DocumentType might be neater, but how to get a
    // reference to DocumentType without assuming it to be available globally?
    // To play nice with custom DOM implementations, we resort to duck-typing.
    if (!doctype
        || doctype.nodeType !== doctype.DOCUMENT_TYPE_NODE
        || typeof doctype.name !== 'string'
        || typeof doctype.publicId !== 'string'
        || typeof doctype.systemId !== 'string'
    ) {
        throw new TypeError('Expected a DocumentType')
    }
    const doctypeString = `<!DOCTYPE ${doctype.name}`
        + (doctype.publicId ? ` PUBLIC "${doctype.publicId}"` : '')
        + (doctype.systemId
            ? (doctype.publicId ? `` : ` SYSTEM`) + ` "${doctype.systemId}"`
            : ``)
        + `>`
    return doctypeString
}

To also get things outside the <html>...</html>, most importantly the <!DOCTYPE ...> declaration, you could walk through document.childNodes, turning each into a string:

const html = [...document.childNodes]
    .map(node => nodeToString(node))
    .join('\n') // could use '' instead, but whitespace should not matter.

function nodeToString(node) {
    switch (node.nodeType) {
        case node.ELEMENT_NODE:
            return node.outerHTML
        case node.TEXT_NODE:
            // Text nodes should probably never be encountered, but handling them anyway.
            return node.textContent
        case node.COMMENT_NODE:
            return `<!--${node.textContent}-->`
        case node.DOCUMENT_TYPE_NODE:
            return doctypeToString(node)
        default:
            throw new TypeError(`Unexpected node type: ${node.nodeType}`)
    }
}

I published this code as document-outerhtml on npm.

edit Note the code above depends on a function doctypeToString; its implementation could be as follows (code below is published on npm as doctype-to-string):

function doctypeToString(doctype) {
    if (doctype === null) {
        return ''
    }
    // Checking with instanceof DocumentType might be neater, but how to get a
    // reference to DocumentType without assuming it to be available globally?
    // To play nice with custom DOM implementations, we resort to duck-typing.
    if (!doctype
        || doctype.nodeType !== doctype.DOCUMENT_TYPE_NODE
        || typeof doctype.name !== 'string'
        || typeof doctype.publicId !== 'string'
        || typeof doctype.systemId !== 'string'
    ) {
        throw new TypeError('Expected a DocumentType')
    }
    const doctypeString = `<!DOCTYPE ${doctype.name}`
        + (doctype.publicId ? ` PUBLIC "${doctype.publicId}"` : '')
        + (doctype.systemId
            ? (doctype.publicId ? `` : ` SYSTEM`) + ` "${doctype.systemId}"`
            : ``)
        + `>`
    return doctypeString
}

回复收藏 0 原文

混吃等死 2024-07-25 21:52:49

可能仅适用于 IE：

>     webBrowser1.DocumentText

适用于 1.0 以上的 FF：

//serialize current DOM-Tree incl. changes/edits to ss-variable
var ns = new XMLSerializer();
var ss= ns.serializeToString(document);
alert(ss.substr(0,300));

可以在 FF 中工作。（显示源文本开头的前 300 个字符，主要是 doctype-def。）

但请注意，FF 的正常“另存为”对话框可能不会保存页面的当前状态，而是保存页面的当前状态。最初加载 X/h/tml-source-text !!
（将 ss 后置到某个临时文件并重定向到该临时文件可能会提供可保存的源文本，其中包含之前对其进行的更改/编辑。）

尽管 FF 令人惊讶的是“返回”的良好恢复和良好的状态包含/values on“另存为...”用于类似输入的字段、文本区域等，而不是 contenteditable/designMode 中的元素...

如果不是 xhtml- resp。 xml 文件（mime 类型，不仅仅是文件扩展名！），可以使用 document.open/write/close 来设置 appr。内容到源层，该内容将通过 FF 的文件/保存菜单保存在用户的保存对话框中。
看：
http://www.w3.org/MarkUp/2004/xhtml-faq# docwrite 分别。

https://developer.mozilla.org/en-US/ docs/Web/API/document.write

对 X(ht)ML 的问题中立，尝试使用“view-source:http://...”作为 (script) 的 src-attrib 的值-made!?) iframe, - 访问 FF 中的 iframes 文档：

.contentDocument，请参阅 google“mdn contentDocument”了解 appr。成员，例如“textContent”。
“几年前就知道了，但我不想爬着去拿。” 如果仍然有紧急需要，请提及这一点，我必须深入研究......

PROBABLY ONLY IE:

>     webBrowser1.DocumentText

for FF up from 1.0:

//serialize current DOM-Tree incl. changes/edits to ss-variable
var ns = new XMLSerializer();
var ss= ns.serializeToString(document);
alert(ss.substr(0,300));

may work in FF. (Shows up the VERY FIRST 300 characters from the VERY beginning of source-text, mostly doctype-defs.)

BUT be aware, that the normal "Save As"-Dialog of FF MIGHT NOT save the current state of the page, rather the originallly loaded X/h/tml-source-text !!
(a POST-up of ss to some temp-file and redirect to that might deliver a saveable source-text WITH the changes/edits prior made to it.)

Although FF surprises by good recovery on "back" and a NICE inclusion of states/values on "Save (as) ..." for input-like FIELDS, textarea etc. , not on elements in contenteditable/ designMode...

If NOT a xhtml- resp. xml-file (mime-type, NOT just filename-extension!), one may use document.open/write/close to SET the appr. content to the source-layer, that will be saved on user's save-dialog from the File/Save menue of FF.
see:
http://www.w3.org/MarkUp/2004/xhtml-faq#docwrite resp.

https://developer.mozilla.org/en-US/docs/Web/API/document.write

Neutral to questions of X(ht)ML, try a "view-source:http://..." as the value of the src-attrib of an (script-made!?) iframe, - to access an iframes-document in FF:

<iframe-elementnode>.contentDocument, see google "mdn contentDocument" for appr. members, like 'textContent' for instance.
'Got that years ago and no like to crawl for it. If still of urgent need, mention this, that I got to dive in ...

回复收藏 0 原文

客…行舟 2024-07-25 21:52:49

document.documentElement.outerHTML

document.documentElement.outerHTML

回复收藏 0 原文

护你周全 2024-07-25 21:52:49

我使用 outerHTML 作为元素（主要的容器），使用 XMLSerializer 作为其他任何内容，包括 ;、容器外部的随机注释或任何其他可能存在的内容。似乎元素外部没有保留空格，因此我默认使用 sep="\n" 添加换行符。

function get_document_html(sep="\n") {
    let html = "";
    let xml = new XMLSerializer();
    for (let n of document.childNodes) {
        if (n.nodeType == Node.ELEMENT_NODE)
            html += n.outerHTML + sep;
        else
            html += xml.serializeToString(n) + sep;
    }
    return html;
}

console.log(get_document_html().slice(0, 200));

I am using outerHTML for elements (the main <html> container), and XMLSerializer for anything else including <!DOCTYPE>, random comments outside the <html> container, or whatever else might be there. It seems that whitespace isn't preserved outside the <html> element, so I'm adding newlines by default with sep="\n".

function get_document_html(sep="\n") {
    let html = "";
    let xml = new XMLSerializer();
    for (let n of document.childNodes) {
        if (n.nodeType == Node.ELEMENT_NODE)
            html += n.outerHTML + sep;
        else
            html += xml.serializeToString(n) + sep;
    }
    return html;
}

console.log(get_document_html().slice(0, 200));

回复收藏 0 原文

日暮斜阳 2024-07-25 21:52:49

我总是使用

document.getElementsByTagName('html')[0].innerHTML

可能不是正确的方式，但当我看到它时我能理解它。

I always use

document.getElementsByTagName('html')[0].innerHTML

Probably not the right way but I can understand it when I see it.

回复收藏 0 原文

哑剧 2024-07-25 21:52:49

使用查询选择器

const html = document.querySelector("html").outerHTML;
console.log(html)

Using querySelector

const html = document.querySelector("html").outerHTML;
console.log(html)

回复收藏 0 原文

神回复 2024-07-25 21:52:49

如果您想获取 DOCTYPE 之外的所有内容，这将起作用：

document.getElementsByTagName('html')[0].outerHTML;

或者如果您也想要 doctype：

new XMLSerializer().serializeToString(document.doctype) + document.getElementsByTagName('html')[0].outerHTML;

This would work if you want to get everything outside the DOCTYPE:

document.getElementsByTagName('html')[0].outerHTML;

or this if you want the doctype too:

new XMLSerializer().serializeToString(document.doctype) + document.getElementsByTagName('html')[0].outerHTML;

回复收藏 0 原文

~没有更多了~

关于作者

饮惑

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

如何以字符串形式获取整个文档 HTML？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（17）

关于作者

相关话题

热门标签

推荐作者

游缘惊梦

小兔几

Glik

生生漫

Luxian

Champion-Ming

友情链接

如何以字符串形式获取整个文档 HTML？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（17）

关于作者

相关话题

热门标签

推荐作者

游缘惊梦

小兔几

Glik

生生漫

Luxian

Champion-Ming

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。