如何使用 JavaScript 创建 Document 对象

发布于 2024-12-17 16:31:17 字数 135 浏览 0 评论 0原文

基本上这就是问题,如何从一串构造 Document 对象javascript 中的 HTML 动态?

Basically that's the question, how is one supposed to construct a Document object from a string of HTML dynamically in javascript?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

像你 2024-12-24 16:31:17

规范中定义了两种方法, createDocumentcreateHTMLDocument来自 HTML5。前者创建 XML 文档(包括 XHTML),后者创建 HTML 文档。两者都作为函数驻留在 DOMImplementation 接口上。

var impl    = document.implementation,
    xmlDoc  = impl.createDocument(namespaceURI, qualifiedNameStr, documentType),
    htmlDoc = impl.createHTMLDocument(title);

实际上,这些方法相当年轻,仅在最近的浏览器版本中实现。根据 http://quirksmode.orgMDN,以下浏览器支持 createHTMLDocument

  • Chrome 4
  • Opera 10
  • Firefox 4
  • Internet Explorer 9
  • Safari 4

有趣的是,您可以(某种程度上)在旧版本的Internet Explorer,使用ActiveXObject

var htmlDoc = new ActiveXObject("htmlfile");

生成的对象将是一个新文档,可以像任何其他文档一样对其进行操作。

There are two methods defined in specifications, createDocument from DOM Core Level 2 and createHTMLDocument from HTML5. The former creates an XML document (including XHTML), the latter creates a HTML document. Both reside, as functions, on the DOMImplementation interface.

var impl    = document.implementation,
    xmlDoc  = impl.createDocument(namespaceURI, qualifiedNameStr, documentType),
    htmlDoc = impl.createHTMLDocument(title);

In reality, these methods are rather young and only implemented in recent browser releases. According to http://quirksmode.org and MDN, the following browsers support createHTMLDocument:

  • Chrome 4
  • Opera 10
  • Firefox 4
  • Internet Explorer 9
  • Safari 4

Interestingly enough, you can (kind of) create a HTML document in older versions of Internet Explorer, using ActiveXObject:

var htmlDoc = new ActiveXObject("htmlfile");

The resulting object will be a new document, which can be manipulated just like any other document.

雨巷深深 2024-12-24 16:31:17

假设您正在尝试从标记字符串和您也碰巧知道的内容类型创建一个完全解析的 Document 对象(可能是因为您从 xmlhttprequest 获取了 html,从而在其 Content 中获取了内容类型-Type http header;通常可能是 text/html) – 应该这么简单:

var doc = (new DOMParser).parseFromString(markup, mime_type);

在理想的未来世界中,浏览器 DOMParser 实现同样强大且有能力因为他们的文档渲染是 –也许这对于未来的 HTML6 标准工作来说是一个很好的白日梦要求。但事实证明,当前的浏览器还没有这样做。

您可能会遇到更简单(但仍然很混乱)的问题,即您想要获得一个完全解析的 Document 对象的 html 字符串。这是关于如何执行此操作的另一种方法,它也应该适用于所有浏览器 - 首先创建一个 HTML Document 对象:

var doc = document.implementation.createHTMLDocument('');

然后 用您的 html 片段填充它

doc.open();
doc.write(html);
doc.close();

现在您应该在 doc 中拥有一个完全解析的 DOM,您可以运行它alert(doc.title) on,使用 css 选择器进行切片,例如 doc.querySelectorAll('p') 或使用 doc.evaluate 的 XPath。

这实际上适用于 Chrome 和 Safari 等现代 WebKit 浏览器(我刚刚分别在 Chrome 22 和 Safari 6 中进行了测试)——这里是一个示例,它获取当前页面的源代码,在新的文档变量 src,读出其标题,用同一源代码的 html 引用版本覆盖它,并在 iframe 中显示结果:http://codepen.io/johan/full/KLIeE

可悲的是,我认为没有任何其他当代浏览器具有如此可靠的功能尚未实施。

Assuming you are trying to create a fully parsed Document object from a string of markup and a content-type you also happen to know (maybe because you got the html from an xmlhttprequest, and thus got the content-type in its Content-Type http header; probably usually text/html) – it should be this easy:

var doc = (new DOMParser).parseFromString(markup, mime_type);

in an ideal future world where browser DOMParser implementations are as strong and competent as their document rendering is – maybe that's a good pipe dream requirement for future HTML6 standards efforts. It turns out no current browsers do, though.

You probably have the easier (but still messy) problem of having a string of html you want to get a fully parsed Document object for. Here is another take on how to do that, which also ought to work in all browsers – first you make a HTML Document object:

var doc = document.implementation.createHTMLDocument('');

and then populate it with your html fragment:

doc.open();
doc.write(html);
doc.close();

Now you should have a fully parsed DOM in doc, which you can run alert(doc.title) on, slice with css selectors like doc.querySelectorAll('p') or ditto XPath using doc.evaluate.

This actually works in modern WebKit browsers like Chrome and Safari (I just tested in Chrome 22 and Safari 6 respectively) – here is an example that takes the current page's source code, recreates it in a new document variable src, reads out its title, overwrites it with a html quoted version of the same source code and shows the result in an iframe: http://codepen.io/johan/full/KLIeE

Sadly, I don't think any other contemporary browsers have quite as solid implementations yet.

凉墨 2024-12-24 16:31:17

根据规范(doc ),可以使用 DOMImplementationcreateHTMLDocument 方法,可通过 document.implementation 访问,如下所示:

var doc = document.implementation.createHTMLDocument('My title');  
var body = document.createElementNS('http://www.w3.org/1999/xhtml', 'body'); 
doc.documentElement.appendChild(body);
// and so on

Per the spec (doc), one may use the createHTMLDocument method of DOMImplementation, accessible via document.implementation as follows:

var doc = document.implementation.createHTMLDocument('My title');  
var body = document.createElementNS('http://www.w3.org/1999/xhtml', 'body'); 
doc.documentElement.appendChild(body);
// and so on
再可℃爱ぅ一点好了 2024-12-24 16:31:17

以下内容适用于大多数常见浏览器,但不适用于某些浏览器。这就是它应该有多简单(但事实并非如此):

// Fails if UA doesn't support parseFromString for text/html (e.g. IE)
function htmlToDoc(markup) {
  var parser = new DOMParser();
  return parser.parseFromString(markup, "text/html");
}

var htmlString = "<title>foo bar</title><div>a div</div>";
alert(htmlToDoc(htmlString).title);

为了考虑到用户代理的变化无常,以下内容可能会更好(请注意归属):

/*
 * DOMParser HTML extension
 * 2012-02-02
 *
 * By Eli Grey, http://eligrey.com
 * Public domain.
 * NO WARRANTY EXPRESSED OR IMPLIED. USE AT YOUR OWN RISK.
 *
 * Modified to work with IE 9 by RobG
 * 2012-08-29
 *
 * Notes:
 *
 *  1. Supplied markup should be avalid HTML document with or without HTML tags and
 *     no DOCTYPE (DOCTYPE support can be added, I just didn't do it)
 *
 *  2. Host method used where host supports text/html
 */

/*! @source https://gist.github.com/1129031 */
/*! @source https://developer.mozilla.org/en-US/docs/DOM/DOMParser */

/*global document, DOMParser*/

(function(DOMParser) {
    "use strict";

    var DOMParser_proto;
    var real_parseFromString;
    var textHTML;         // Flag for text/html support
    var textXML;          // Flag for text/xml support
    var htmlElInnerHTML;  // Flag for support for setting html element's innerHTML

    // Stop here if DOMParser not defined
    if (!DOMParser) return;

    // Firefox, Opera and IE throw errors on unsupported types
    try {
        // WebKit returns null on unsupported types
        textHTML = !!(new DOMParser).parseFromString('', 'text/html');

    } catch (er) {
      textHTML = false;
    }

    // If text/html supported, don't need to do anything.
    if (textHTML) return;

    // Next try setting innerHTML of a created document
    // IE 9 and lower will throw an error (can't set innerHTML of its HTML element)
    try {
      var doc = document.implementation.createHTMLDocument('');
      doc.documentElement.innerHTML = '<title></title><div></div>';
      htmlElInnerHTML = true;

    } catch (er) {
      htmlElInnerHTML = false;
    }

    // If if that failed, try text/xml
    if (!htmlElInnerHTML) {

        try {
            textXML = !!(new DOMParser).parseFromString('', 'text/xml');

        } catch (er) {
            textHTML = false;
        }
    }

    // Mess with DOMParser.prototype (less than optimal...) if one of the above worked
    // Assume can write to the prototype, if not, make this a stand alone function
    if (DOMParser.prototype && (htmlElInnerHTML || textXML)) { 
        DOMParser_proto = DOMParser.prototype;
        real_parseFromString = DOMParser_proto.parseFromString;

        DOMParser_proto.parseFromString = function (markup, type) {

            // Only do this if type is text/html
            if (/^\s*text\/html\s*(?:;|$)/i.test(type)) {
                var doc, doc_el, first_el;

                // Use innerHTML if supported
                if (htmlElInnerHTML) {
                    doc = document.implementation.createHTMLDocument("");
                    doc_el = doc.documentElement;
                    doc_el.innerHTML = markup;
                    first_el = doc_el.firstElementChild;

                // Otherwise use XML method
                } else if (textXML) {

                    // Make sure markup is wrapped in HTML tags
                    // Should probably allow for a DOCTYPE
                    if (!(/^<html.*html>$/i.test(markup))) {
                        markup = '<html>' + markup + '<\/html>'; 
                    }
                    doc = (new DOMParser).parseFromString(markup, 'text/xml');
                    doc_el = doc.documentElement;
                    first_el = doc_el.firstElementChild;
                }

                // RG: I don't understand the point of this, I'll leave it here though 
                //     In IE, doc_el is the HTML element and first_el is the HEAD.
                //
                // Is this an entire document or a fragment?
                if (doc_el.childElementCount == 1 && first_el.localName.toLowerCase() == 'html') {
                    doc.replaceChild(first_el, doc_el);
                }

                return doc;

            // If not text/html, send as-is to host method
            } else {
                return real_parseFromString.apply(this, arguments);
            }
        };
    }
}(DOMParser));

// Now some test code
var htmlString = '<html><head><title>foo bar</title></head><body><div>a div</div></body></html>';
var dp = new DOMParser();
var doc = dp.parseFromString(htmlString, 'text/html');

// Treat as an XML document and only use DOM Core methods
alert(doc.documentElement.getElementsByTagName('title')[0].childNodes[0].data);

不要因代码量而推迟,有很多注释,可以缩短很多,但可读性较差。

哦,如果标记是有效的 XML,那么事情就简单多了:

var stringToXMLDoc = (function(global) {

  // W3C DOMParser support
  if (global.DOMParser) {
    return function (text) {
      var parser = new global.DOMParser();
      return parser.parseFromString(text,"application/xml");
    }

  // MS ActiveXObject support
  } else {
    return function (text) {
      var xmlDoc;

      // Can't assume support and can't test, so try..catch
      try {
        xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
        xmlDoc.async="false";
        xmlDoc.loadXML(text);
      } catch (e){}
      return xmlDoc;
    }
  }
}(this));


var doc = stringToXMLDoc('<books><book title="foo"/><book title="bar"/><book title="baz"/></books>');
alert(
  doc.getElementsByTagName('book')[2].getAttribute('title')
);

The following works in most common browsers, but not some. This is how simple it should be (but isn't):

// Fails if UA doesn't support parseFromString for text/html (e.g. IE)
function htmlToDoc(markup) {
  var parser = new DOMParser();
  return parser.parseFromString(markup, "text/html");
}

var htmlString = "<title>foo bar</title><div>a div</div>";
alert(htmlToDoc(htmlString).title);

To account for user agent vagaries, the following may be better (please note attribution):

/*
 * DOMParser HTML extension
 * 2012-02-02
 *
 * By Eli Grey, http://eligrey.com
 * Public domain.
 * NO WARRANTY EXPRESSED OR IMPLIED. USE AT YOUR OWN RISK.
 *
 * Modified to work with IE 9 by RobG
 * 2012-08-29
 *
 * Notes:
 *
 *  1. Supplied markup should be avalid HTML document with or without HTML tags and
 *     no DOCTYPE (DOCTYPE support can be added, I just didn't do it)
 *
 *  2. Host method used where host supports text/html
 */

/*! @source https://gist.github.com/1129031 */
/*! @source https://developer.mozilla.org/en-US/docs/DOM/DOMParser */

/*global document, DOMParser*/

(function(DOMParser) {
    "use strict";

    var DOMParser_proto;
    var real_parseFromString;
    var textHTML;         // Flag for text/html support
    var textXML;          // Flag for text/xml support
    var htmlElInnerHTML;  // Flag for support for setting html element's innerHTML

    // Stop here if DOMParser not defined
    if (!DOMParser) return;

    // Firefox, Opera and IE throw errors on unsupported types
    try {
        // WebKit returns null on unsupported types
        textHTML = !!(new DOMParser).parseFromString('', 'text/html');

    } catch (er) {
      textHTML = false;
    }

    // If text/html supported, don't need to do anything.
    if (textHTML) return;

    // Next try setting innerHTML of a created document
    // IE 9 and lower will throw an error (can't set innerHTML of its HTML element)
    try {
      var doc = document.implementation.createHTMLDocument('');
      doc.documentElement.innerHTML = '<title></title><div></div>';
      htmlElInnerHTML = true;

    } catch (er) {
      htmlElInnerHTML = false;
    }

    // If if that failed, try text/xml
    if (!htmlElInnerHTML) {

        try {
            textXML = !!(new DOMParser).parseFromString('', 'text/xml');

        } catch (er) {
            textHTML = false;
        }
    }

    // Mess with DOMParser.prototype (less than optimal...) if one of the above worked
    // Assume can write to the prototype, if not, make this a stand alone function
    if (DOMParser.prototype && (htmlElInnerHTML || textXML)) { 
        DOMParser_proto = DOMParser.prototype;
        real_parseFromString = DOMParser_proto.parseFromString;

        DOMParser_proto.parseFromString = function (markup, type) {

            // Only do this if type is text/html
            if (/^\s*text\/html\s*(?:;|$)/i.test(type)) {
                var doc, doc_el, first_el;

                // Use innerHTML if supported
                if (htmlElInnerHTML) {
                    doc = document.implementation.createHTMLDocument("");
                    doc_el = doc.documentElement;
                    doc_el.innerHTML = markup;
                    first_el = doc_el.firstElementChild;

                // Otherwise use XML method
                } else if (textXML) {

                    // Make sure markup is wrapped in HTML tags
                    // Should probably allow for a DOCTYPE
                    if (!(/^<html.*html>$/i.test(markup))) {
                        markup = '<html>' + markup + '<\/html>'; 
                    }
                    doc = (new DOMParser).parseFromString(markup, 'text/xml');
                    doc_el = doc.documentElement;
                    first_el = doc_el.firstElementChild;
                }

                // RG: I don't understand the point of this, I'll leave it here though 
                //     In IE, doc_el is the HTML element and first_el is the HEAD.
                //
                // Is this an entire document or a fragment?
                if (doc_el.childElementCount == 1 && first_el.localName.toLowerCase() == 'html') {
                    doc.replaceChild(first_el, doc_el);
                }

                return doc;

            // If not text/html, send as-is to host method
            } else {
                return real_parseFromString.apply(this, arguments);
            }
        };
    }
}(DOMParser));

// Now some test code
var htmlString = '<html><head><title>foo bar</title></head><body><div>a div</div></body></html>';
var dp = new DOMParser();
var doc = dp.parseFromString(htmlString, 'text/html');

// Treat as an XML document and only use DOM Core methods
alert(doc.documentElement.getElementsByTagName('title')[0].childNodes[0].data);

Don't be put off by the amount of code, there are a lot of comments, it can be shortened quite a bit but becomes less readable.

Oh, and if the markup is valid XML, life is much simpler:

var stringToXMLDoc = (function(global) {

  // W3C DOMParser support
  if (global.DOMParser) {
    return function (text) {
      var parser = new global.DOMParser();
      return parser.parseFromString(text,"application/xml");
    }

  // MS ActiveXObject support
  } else {
    return function (text) {
      var xmlDoc;

      // Can't assume support and can't test, so try..catch
      try {
        xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
        xmlDoc.async="false";
        xmlDoc.loadXML(text);
      } catch (e){}
      return xmlDoc;
    }
  }
}(this));


var doc = stringToXMLDoc('<books><book title="foo"/><book title="bar"/><book title="baz"/></books>');
alert(
  doc.getElementsByTagName('book')[2].getAttribute('title')
);
攀登最高峰 2024-12-24 16:31:17

随着 DOMparser 的发展,2014 年的更新答案。这适用于我能找到的所有当前浏览器,并且应该也适用于早期版本的 IE,使用上面的 ecManaut 的 document.implementation.createHTMLDocument('') 方法。

本质上,IE、Opera、Firefox 都可以解析为“text/html”。 Safari 解析为“text/xml”。

但要注意不宽容的 XML 解析。 Safari 解析将在不间断空格和其他用 & 符号指定的 HTML 字符(法语/德语重音)处崩溃。下面的代码不是单独处理每个字符,而是将所有 & 符号替换为无意义的字符串“j!J!”。当在浏览器中显示结果时,该字符串随后可以重新呈现为&符号(我发现,比尝试在“错误”XML解析中处理&符号更简单)。

function parseHTML(sText) {
try {

    console.log("Domparser: " + typeof window.DOMParser);

    if (typeof window.DOMParser !=null) {
        // modern IE, Firefox, Opera  parse text/html
        var parser = new DOMParser();
        var doc = parser.parseFromString(sText, "text/html");
        if (doc != null) {
            console.log("parsed as HTML");
            return doc

        }
        else {

            //replace ampersands with harmless character string to avoid XML parsing issues
            sText = sText.replace(/&/gi, "j!J!");
            //safari parses as text/xml
            var doc = parser.parseFromString(sText, "text/xml");
            console.log("parsed as XML");
            return doc;
        }

    } 
    else  {
        // older IE 
        doc= document.implementation.createHTMLDocument('');
        doc.write(sText);           
        doc.close;
        return doc; 
    }
} catch (err) {
    alert("Error parsing html:\n" + err.message);
}
}

An updated answer for 2014, as the DOMparser has evolved. This works in all current browsers I can find, and should work too in earlier versions of IE, using ecManaut's document.implementation.createHTMLDocument('') approach above.

Essentially, IE, Opera, Firefox can all parse as "text/html". Safari parses as "text/xml".

Beware of intolerant XML parsing, though. The Safari parse will break down at non-breaking spaces and other HTML characters (French/German accents) designated with ampersands. Rather than handle each character separately, the code below replaces all ampersands with meaningless character string "j!J!". This string can subsequently be re-rendered as an ampersand when displaying the results in a browser (simpler, I have found, than trying to handle ampersands in "false" XML parsing).

function parseHTML(sText) {
try {

    console.log("Domparser: " + typeof window.DOMParser);

    if (typeof window.DOMParser !=null) {
        // modern IE, Firefox, Opera  parse text/html
        var parser = new DOMParser();
        var doc = parser.parseFromString(sText, "text/html");
        if (doc != null) {
            console.log("parsed as HTML");
            return doc

        }
        else {

            //replace ampersands with harmless character string to avoid XML parsing issues
            sText = sText.replace(/&/gi, "j!J!");
            //safari parses as text/xml
            var doc = parser.parseFromString(sText, "text/xml");
            console.log("parsed as XML");
            return doc;
        }

    } 
    else  {
        // older IE 
        doc= document.implementation.createHTMLDocument('');
        doc.write(sText);           
        doc.close;
        return doc; 
    }
} catch (err) {
    alert("Error parsing html:\n" + err.message);
}
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文