Javascript从另一个网站的源代码中查找html元素的问题

发布于 2025-01-15 01:57:19 字数 4937 浏览 0 评论 0原文

我无法从所选页面的下载源代码中查找单个 html 元素。当我使用函数 $(data).find('p').length 时,它返回数字 2,这是正确的答案,但如果我使用函数 $(data) .find('img').length 它返回0,它应该是1。

async function getErrors() {
    await $.ajax({
            url: 'http://example.com',
            method: 'get'
        })
        .done(async (siteText) => {
            var data = $.parseHTML(siteText);
            console.log(data);
            console.log($(data).find('p').length);
            console.log($(data).find('img').length);
             await axios.get('http://anothersite.com')
            .then((response) => {
                //do something...
            });
        });
}

实例:

var siteText = `<!DOCTYPE html>
<html lang="pl">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Test Site</title>
    <style>
        .black{
            background-color: black;
            color: #333131;
        }
    </style>
</head>
<body>
    <h1>Strona Testowa</h1>
    <div>
        <h2>Lorem Ipsum</h2>
        <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Convallis aenean et tortor at risus. Pellentesque habitant morbi tristique senectus. Nisi est sit amet facilisis. Vel elit scelerisque mauris pellentesque pulvinar. Quisque egestas diam in arcu. Elit at imperdiet dui accumsan sit amet nulla. Urna porttitor rhoncus dolor purus non enim praesent elementum. Velit dignissim sodales ut eu sem integer vitae justo eget. Lacus suspendisse faucibus interdum posuere lorem. Et ultrices neque ornare aenean euismod. Porttitor eget dolor morbi non. Sit amet consectetur adipiscing elit. Amet nisl suscipit adipiscing bibendum est. Eu non diam phasellus vestibulum. Neque convallis a cras semper auctor. Risus at ultrices mi tempus imperdiet nulla malesuada pellentesque elit. Et molestie ac feugiat sed lectus vestibulum. Adipiscing diam donec adipiscing tristique risus nec. Imperdiet proin fermentum leo vel. Nibh mauris cursus mattis molestie a iaculis at erat pellentesque. Elementum integer enim neque volutpat ac tincidunt vitae semper. Nam libero justo laoreet sit. Nibh tortor id aliquet lectus proin nibh nisl condimentum id. Et sollicitudin ac orci phasellus egestas tellus. Nunc sed augue lacus viverra vitae congue eu. Dui vivamus arcu felis bibendum ut. Mattis nunc sed blandit libero volutpat sed. Commodo sed egestas egestas fringilla phasellus faucibus scelerisque eleifend. Velit aliquet sagittis id consectetur purus ut faucibus pulvinar elementum. Quam vulputate dignissim suspendisse in est ante in nibh. Accumsan sit amet nulla facilisi morbi. Ac ut consequat semper viverra. Viverra tellus in hac habitasse platea dictumst. Donec ultrices tincidunt arcu non sodales neque. In est ante in nibh mauris. Mattis enim ut tellus elementum sagittis. Consectetur adipiscing elit pellentesque habitant morbi tristique senectus et netus. Sed id semper risus in. Vestibulum lectus mauris ultrices eros in cursus turpis massa. Vitae tempus quam pellentesque nec nam aliquam sem et tortor. In arcu cursus euismod quis viverra nibh cras. Sit amet consectetur adipiscing elit duis tristique. Augue ut lectus arcu bibendum at varius vel pharetra vel. Pharetra magna ac placerat vestibulum lectus mauris ultrices eros in. Libero nunc consequat interdum varius sit amet mattis vulputate. Netus et malesuada fames ac. In pellentesque massa placerat duis ultricies lacus sed turpis tincidunt. Tellus in hac habitasse platea dictumst vestibulum rhoncus est pellentesque. Duis convallis convallis tellus id interdum velit laoreet. Et tortor consequat id porta nibh venenatis cras. Laoreet sit amet cursus sit amet dictum sit amet justo.</p>
    </div>
    <img src="https://png.pngtree.com/png-clipart/20190108/ourmid/pngtree-tree-green-plant-photography-png-png-image_305004.jpg" >
    <iframe width="560" height="315" src="https://www.youtube.com/embed/gK8s4LUJ7NE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
    <div class="black">
        <p class="black">Lorem Ipsum</p>
    </div>
</body>
</html>`;

var data = $.parseHTML(siteText);
console.log(data);
console.log($(data).find('p').length);
console.log($(data).find('img').length);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

I am having trouble finding individual html elements from the downloaded source code of a selected page. When I use the function $(data).find('p').lengthit returns me the number 2 which is the correct answer, but if I use the function $(data).find('img').length it returns me 0 and it should be 1.

async function getErrors() {
    await $.ajax({
            url: 'http://example.com',
            method: 'get'
        })
        .done(async (siteText) => {
            var data = $.parseHTML(siteText);
            console.log(data);
            console.log($(data).find('p').length);
            console.log($(data).find('img').length);
             await axios.get('http://anothersite.com')
            .then((response) => {
                //do something...
            });
        });
}

Live example:

var siteText = `<!DOCTYPE html>
<html lang="pl">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Test Site</title>
    <style>
        .black{
            background-color: black;
            color: #333131;
        }
    </style>
</head>
<body>
    <h1>Strona Testowa</h1>
    <div>
        <h2>Lorem Ipsum</h2>
        <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Convallis aenean et tortor at risus. Pellentesque habitant morbi tristique senectus. Nisi est sit amet facilisis. Vel elit scelerisque mauris pellentesque pulvinar. Quisque egestas diam in arcu. Elit at imperdiet dui accumsan sit amet nulla. Urna porttitor rhoncus dolor purus non enim praesent elementum. Velit dignissim sodales ut eu sem integer vitae justo eget. Lacus suspendisse faucibus interdum posuere lorem. Et ultrices neque ornare aenean euismod. Porttitor eget dolor morbi non. Sit amet consectetur adipiscing elit. Amet nisl suscipit adipiscing bibendum est. Eu non diam phasellus vestibulum. Neque convallis a cras semper auctor. Risus at ultrices mi tempus imperdiet nulla malesuada pellentesque elit. Et molestie ac feugiat sed lectus vestibulum. Adipiscing diam donec adipiscing tristique risus nec. Imperdiet proin fermentum leo vel. Nibh mauris cursus mattis molestie a iaculis at erat pellentesque. Elementum integer enim neque volutpat ac tincidunt vitae semper. Nam libero justo laoreet sit. Nibh tortor id aliquet lectus proin nibh nisl condimentum id. Et sollicitudin ac orci phasellus egestas tellus. Nunc sed augue lacus viverra vitae congue eu. Dui vivamus arcu felis bibendum ut. Mattis nunc sed blandit libero volutpat sed. Commodo sed egestas egestas fringilla phasellus faucibus scelerisque eleifend. Velit aliquet sagittis id consectetur purus ut faucibus pulvinar elementum. Quam vulputate dignissim suspendisse in est ante in nibh. Accumsan sit amet nulla facilisi morbi. Ac ut consequat semper viverra. Viverra tellus in hac habitasse platea dictumst. Donec ultrices tincidunt arcu non sodales neque. In est ante in nibh mauris. Mattis enim ut tellus elementum sagittis. Consectetur adipiscing elit pellentesque habitant morbi tristique senectus et netus. Sed id semper risus in. Vestibulum lectus mauris ultrices eros in cursus turpis massa. Vitae tempus quam pellentesque nec nam aliquam sem et tortor. In arcu cursus euismod quis viverra nibh cras. Sit amet consectetur adipiscing elit duis tristique. Augue ut lectus arcu bibendum at varius vel pharetra vel. Pharetra magna ac placerat vestibulum lectus mauris ultrices eros in. Libero nunc consequat interdum varius sit amet mattis vulputate. Netus et malesuada fames ac. In pellentesque massa placerat duis ultricies lacus sed turpis tincidunt. Tellus in hac habitasse platea dictumst vestibulum rhoncus est pellentesque. Duis convallis convallis tellus id interdum velit laoreet. Et tortor consequat id porta nibh venenatis cras. Laoreet sit amet cursus sit amet dictum sit amet justo.</p>
    </div>
    <img src="https://png.pngtree.com/png-clipart/20190108/ourmid/pngtree-tree-green-plant-photography-png-png-image_305004.jpg" >
    <iframe width="560" height="315" src="https://www.youtube.com/embed/gK8s4LUJ7NE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
    <div class="black">
        <p class="black">Lorem Ipsum</p>
    </div>
</body>
</html>`;

var data = $.parseHTML(siteText);
console.log(data);
console.log($(data).find('p').length);
console.log($(data).find('img').length);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦行七里 2025-01-22 01:57:19

作为替代方案,您可以在新创建的元素上使用 html() 函数来解析 HTML。这样 find() 函数就可以工作,因为它会查找新元素的子元素。

详细说明:

parseHTML()html()解析的HTML会忽略 标签。

因此,解析会返回 head 和 body 中的节点数组,因此当直接包装在 jQuery 对象中时,find() 函数会在该数组中的每个元素上运行。这就是为什么 find() 无法找到 的直接子级。 filter() 函数之所以起作用是因为它过滤了数组。

通过将结果包装在新元素中,查找函数将在完整的 内容上正确工作,因为它们现在是新元素的子元素。

var siteText = `<!DOCTYPE html>
<html lang="pl">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Test Site</title>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.1.0/css/all.min.css">
    <style>
        .black{
            background-color: black;
            color: #333131;
        }
    </style>
</head>
<body>
    <h1>Strona Testowa<span class="fa fa-bath"></span></h1>
    <div>
        <h2>Lorem Ipsum</h2>
        <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Convallis aenean et tortor at risus. Pellentesque habitant morbi tristique senectus. Nisi est sit amet facilisis. Vel elit scelerisque mauris pellentesque pulvinar. Quisque egestas diam in arcu. Elit at imperdiet dui accumsan sit amet nulla. Urna porttitor rhoncus dolor purus non enim praesent elementum. Velit dignissim sodales ut eu sem integer vitae justo eget. Lacus suspendisse faucibus interdum posuere lorem. Et ultrices neque ornare aenean euismod. Porttitor eget dolor morbi non. Sit amet consectetur adipiscing elit. Amet nisl suscipit adipiscing bibendum est. Eu non diam phasellus vestibulum. Neque convallis a cras semper auctor. Risus at ultrices mi tempus imperdiet nulla malesuada pellentesque elit. Et molestie ac feugiat sed lectus vestibulum. Adipiscing diam donec adipiscing tristique risus nec. Imperdiet proin fermentum leo vel. Nibh mauris cursus mattis molestie a iaculis at erat pellentesque. Elementum integer enim neque volutpat ac tincidunt vitae semper. Nam libero justo laoreet sit. Nibh tortor id aliquet lectus proin nibh nisl condimentum id. Et sollicitudin ac orci phasellus egestas tellus. Nunc sed augue lacus viverra vitae congue eu. Dui vivamus arcu felis bibendum ut. Mattis nunc sed blandit libero volutpat sed. Commodo sed egestas egestas fringilla phasellus faucibus scelerisque eleifend. Velit aliquet sagittis id consectetur purus ut faucibus pulvinar elementum. Quam vulputate dignissim suspendisse in est ante in nibh. Accumsan sit amet nulla facilisi morbi. Ac ut consequat semper viverra. Viverra tellus in hac habitasse platea dictumst. Donec ultrices tincidunt arcu non sodales neque. In est ante in nibh mauris. Mattis enim ut tellus elementum sagittis. Consectetur adipiscing elit pellentesque habitant morbi tristique senectus et netus. Sed id semper risus in. Vestibulum lectus mauris ultrices eros in cursus turpis massa. Vitae tempus quam pellentesque nec nam aliquam sem et tortor. In arcu cursus euismod quis viverra nibh cras. Sit amet consectetur adipiscing elit duis tristique. Augue ut lectus arcu bibendum at varius vel pharetra vel. Pharetra magna ac placerat vestibulum lectus mauris ultrices eros in. Libero nunc consequat interdum varius sit amet mattis vulputate. Netus et malesuada fames ac. In pellentesque massa placerat duis ultricies lacus sed turpis tincidunt. Tellus in hac habitasse platea dictumst vestibulum rhoncus est pellentesque. Duis convallis convallis tellus id interdum velit laoreet. Et tortor consequat id porta nibh venenatis cras. Laoreet sit amet cursus sit amet dictum sit amet justo.</p>
    </div>
    <img src="https://png.pngtree.com/png-clipart/20190108/ourmid/pngtree-tree-green-plant-photography-png-png-image_305004.jpg" >
    <iframe width="560" height="315" src="https://www.youtube.com/embed/gK8s4LUJ7NE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
    <div class="black">
        <p class="black">Lorem Ipsum</p>
    </div>
</body>
</html>`;

var data = $('<div></div>').html(siteText);

$('#target').append(data);

$('#target').find('p').each((i, el) => {
  console.log($(el).css('color'));
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id="target" style="display: block; width: 100%; height: 100%"></div>

As an alternative you could use the html() function on a newly created element to parse your HTML. This way the find() function works because it looks for child elements of the new element.

Detailed explenation:

HTML parsed by parseHTML() and html() will ignore <html>, <head> and <body> tags.

So the parsing returns an array of the nodes in the head and body so the find() function runs on every element in that array when wrapped in a jQuery object directly. That's why find() can't find the direct children of <body>. The filter() function works because it filters the array.

By wrapping the result in a new element the find function will work correctly on the full <body> content since they are now children of the new element.

var siteText = `<!DOCTYPE html>
<html lang="pl">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Test Site</title>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.1.0/css/all.min.css">
    <style>
        .black{
            background-color: black;
            color: #333131;
        }
    </style>
</head>
<body>
    <h1>Strona Testowa<span class="fa fa-bath"></span></h1>
    <div>
        <h2>Lorem Ipsum</h2>
        <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Convallis aenean et tortor at risus. Pellentesque habitant morbi tristique senectus. Nisi est sit amet facilisis. Vel elit scelerisque mauris pellentesque pulvinar. Quisque egestas diam in arcu. Elit at imperdiet dui accumsan sit amet nulla. Urna porttitor rhoncus dolor purus non enim praesent elementum. Velit dignissim sodales ut eu sem integer vitae justo eget. Lacus suspendisse faucibus interdum posuere lorem. Et ultrices neque ornare aenean euismod. Porttitor eget dolor morbi non. Sit amet consectetur adipiscing elit. Amet nisl suscipit adipiscing bibendum est. Eu non diam phasellus vestibulum. Neque convallis a cras semper auctor. Risus at ultrices mi tempus imperdiet nulla malesuada pellentesque elit. Et molestie ac feugiat sed lectus vestibulum. Adipiscing diam donec adipiscing tristique risus nec. Imperdiet proin fermentum leo vel. Nibh mauris cursus mattis molestie a iaculis at erat pellentesque. Elementum integer enim neque volutpat ac tincidunt vitae semper. Nam libero justo laoreet sit. Nibh tortor id aliquet lectus proin nibh nisl condimentum id. Et sollicitudin ac orci phasellus egestas tellus. Nunc sed augue lacus viverra vitae congue eu. Dui vivamus arcu felis bibendum ut. Mattis nunc sed blandit libero volutpat sed. Commodo sed egestas egestas fringilla phasellus faucibus scelerisque eleifend. Velit aliquet sagittis id consectetur purus ut faucibus pulvinar elementum. Quam vulputate dignissim suspendisse in est ante in nibh. Accumsan sit amet nulla facilisi morbi. Ac ut consequat semper viverra. Viverra tellus in hac habitasse platea dictumst. Donec ultrices tincidunt arcu non sodales neque. In est ante in nibh mauris. Mattis enim ut tellus elementum sagittis. Consectetur adipiscing elit pellentesque habitant morbi tristique senectus et netus. Sed id semper risus in. Vestibulum lectus mauris ultrices eros in cursus turpis massa. Vitae tempus quam pellentesque nec nam aliquam sem et tortor. In arcu cursus euismod quis viverra nibh cras. Sit amet consectetur adipiscing elit duis tristique. Augue ut lectus arcu bibendum at varius vel pharetra vel. Pharetra magna ac placerat vestibulum lectus mauris ultrices eros in. Libero nunc consequat interdum varius sit amet mattis vulputate. Netus et malesuada fames ac. In pellentesque massa placerat duis ultricies lacus sed turpis tincidunt. Tellus in hac habitasse platea dictumst vestibulum rhoncus est pellentesque. Duis convallis convallis tellus id interdum velit laoreet. Et tortor consequat id porta nibh venenatis cras. Laoreet sit amet cursus sit amet dictum sit amet justo.</p>
    </div>
    <img src="https://png.pngtree.com/png-clipart/20190108/ourmid/pngtree-tree-green-plant-photography-png-png-image_305004.jpg" >
    <iframe width="560" height="315" src="https://www.youtube.com/embed/gK8s4LUJ7NE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
    <div class="black">
        <p class="black">Lorem Ipsum</p>
    </div>
</body>
</html>`;

var data = $('<div></div>').html(siteText);

$('#target').append(data);

$('#target').find('p').each((i, el) => {
  console.log($(el).css('color'));
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id="target" style="display: block; width: 100%; height: 100%"></div>

日久见人心 2025-01-22 01:57:19

我在另一个网站上尝试使用您的代码,效果很好。我修改了你的 JS 以暂时摆脱 async/await:

$.ajax({
    url: 'http://jsfiddle.net/2AaFk/1',
    method: 'get'
})
.done((siteText) => {
    console.log(siteText);
    var data = $.parseHTML(siteText);
    //console.log(data);
    console.log($(data).find('h3').length);
});

I tried with your code with another site and that's working fine. I modified your JS to temporary get rid of async/await:

$.ajax({
    url: 'http://jsfiddle.net/2AaFk/1',
    method: 'get'
})
.done((siteText) => {
    console.log(siteText);
    var data = $.parseHTML(siteText);
    //console.log(data);
    console.log($(data).find('h3').length);
});
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文