如何从通过 XMLHttpRequest 接收的 html 页面创建 DOM 对象?

发布于 2024-09-28 08:44:17 字数 1422 浏览 5 评论 0原文

我正在开发一个 chromium 扩展,因此我对我请求权限的域具有 XMLHttpRequest 的跨主机权限。

我已经使用了 XMLHttpRequest 并获得了一个 HTML 网页 (txt/html)。我想使用 XPath (document.evaluate) 从中提取相关位。不幸的是,我无法从返回的 html 字符串构造 DOM 对象。

var xhr = new XMLHttpRequest();
var name = escape("Sticks N Stones Cap");
xhr.open("GET", "http://items.jellyneo.net/?go=show_items&name="+name+"&name_type=exact", true);
xhr.onreadystatechange = function () {
    if (xhr.readyState == 4) {
    var parser = new DOMParser();
    var xmlDoc = parser.parseFromString(xhr.responseText,"text/xml");
    console.log(xmlDoc);
    }
}

xhr.send();

console.log 用于在 Chromium JS 控制台中显示调试内容。

在所说的JS控制台中。我明白了:

Document
<html>​
<body>​
<parsererror style=​"display:​ block;​ white-space:​ pre;​ border:​ 2px solid #c77;​ padding:​ 0 1em 0 1em;​ margin:​ 1em;​ background-color:​ #fdd;​ color:​ black">​
<h3>​This page contains the following errors:​</h3>​
<div style=​"font-family:​monospace;​font-size:​12px">​error on line 1 at column 60: Space required after the Public Identifier
​</div>​
<h3>​Below is a rendering of the page up to the first error.​</h3>​
</parsererror>​
</body>​
</html>​

那么我该如何使用 XMLHttpRequest ->接收 HTML ->转换为 DOM ->使用XPath来横向?

我应该使用“隐藏” iframe hack 来加载/接收 DOM 对象吗?

I'm developing a chromium extension so I have cross-host permissions for XMLHttpRequests for the domains I'm asking permissions for.

I have used XMLHttpRequest and got an HTML webpage (txt/html). I want to use XPath (document.evaluate) to extract relevant bits from it. Unfortunatly I'm failing to construct a DOM object from the returned string of the html.

var xhr = new XMLHttpRequest();
var name = escape("Sticks N Stones Cap");
xhr.open("GET", "http://items.jellyneo.net/?go=show_items&name="+name+"&name_type=exact", true);
xhr.onreadystatechange = function () {
    if (xhr.readyState == 4) {
    var parser = new DOMParser();
    var xmlDoc = parser.parseFromString(xhr.responseText,"text/xml");
    console.log(xmlDoc);
    }
}

xhr.send();

console.log is to display debug stuff in Chromium JS console.

In the said JS console. I get this:

Document
<html>​
<body>​
<parsererror style=​"display:​ block;​ white-space:​ pre;​ border:​ 2px solid #c77;​ padding:​ 0 1em 0 1em;​ margin:​ 1em;​ background-color:​ #fdd;​ color:​ black">​
<h3>​This page contains the following errors:​</h3>​
<div style=​"font-family:​monospace;​font-size:​12px">​error on line 1 at column 60: Space required after the Public Identifier
​</div>​
<h3>​Below is a rendering of the page up to the first error.​</h3>​
</parsererror>​
</body>​
</html>​

So how am I suppose to use XMLHttpRequest -> receive HTML -> convert to DOM -> use XPath to transverse?

Should I be using the "hidden" iframe hack for loading / receiving DOM object?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

黒涩兲箜 2024-10-05 08:44:17

DOMParser 因 DOCTYPE 定义而窒息。对于任何其他非 xhtml 标记(例如没有结束 /),它也会出错。您可以控制发送的文档吗?如果没有,最好的选择是将其解析为字符串。使用正则表达式来查找您要查找的内容。

编辑:您可以通过将正文注入隐藏的 div 来让浏览器为您解析正文的内容:

var hidden = document.body.appendChild(document.createElement("div"));
hidden.style.display = "none";
hidden.innerHTML = /<body[^>]*>([\s\S]+)<\/body>/i(xhr.responseText)[1];

现在在 hidden 内搜索以查找您要查找的内容:

var myEl = hidden.querySelector("table.foo > tr > td.bar > span.fu");
var myVal = myEl.innerHTML;

The DOMParser is choking on the DOCTYPE definition. It would also error on any other non-xhtml markup such as a <link> without a closing /. Do you have control over the document being sent? If not, your best bet is to parse it as a string. Use regular expressions to find what you are looking for.

Edit: You can get the browser to parse the contents of the body for you by injecting it into a hidden div:

var hidden = document.body.appendChild(document.createElement("div"));
hidden.style.display = "none";
hidden.innerHTML = /<body[^>]*>([\s\S]+)<\/body>/i(xhr.responseText)[1];

Now search inside hidden to find what you're looking for:

var myEl = hidden.querySelector("table.foo > tr > td.bar > span.fu");
var myVal = myEl.innerHTML;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文