使用 urllib 和 beautifulsoup 查找“隐藏”内的值标签

发布于 2024-10-22 06:26:05 字数 1513 浏览 2 评论 0原文

我想知道是否可以显示隐藏标签的值。我使用 urllib 和 beautifulsoup 但我似乎无法得到我想要的。

我使用的html代码写如下:(另存为hiddentry.html

<html>

<head>
    <script type="text/javascript">
        //change hidden elem value
        function changeValue()
        {
            document.getElementById('hiddenElem').value = 'hello matey!';
        }

        //this will verify if i have successfully changed the hiddenElem's value
        function printHidden()
        {
            document.getElementById('displayHere').innerHTML = document.getElementById('hiddenElem').value;
        }
    </script>
</head>

<body>

    <div id="hiddenDiv" style="position: absolute; left: -1500px">
        <!--i want to find the value of this element right here-->
        <span id="hiddenElem"></span>
    </div>

    <span id="displayHere"></span>

    <script type="text/javascript">
        changeValue();
        printHidden();
    </script>

</body>

</html>

我想要打印的是id为hiddenElem的元素的值。 为此,我尝试使用 urllib 和 beautifulsoup 组合。我使用的代码是:

from BeautifulSoup import BeautifulSoup
import urllib2
import urllib

mysite = urllib.urlopen("http://localhost/hiddentry.html")
soup = BeautifulSoup(mysite)
print soup.prettify()
print '\n\n'

areUthere = soup.find(id="hiddenElem").find(text=True)
print areUthere

我得到的输出是。 有什么想法吗?我想要实现的目标是否可能实现?

i want to know if it is possible to display the values of hidden tags. im using urllib and beautifulsoup but i cant seem to get what i want.

the html code im using is written below: (saved as hiddentry.html)

<html>

<head>
    <script type="text/javascript">
        //change hidden elem value
        function changeValue()
        {
            document.getElementById('hiddenElem').value = 'hello matey!';
        }

        //this will verify if i have successfully changed the hiddenElem's value
        function printHidden()
        {
            document.getElementById('displayHere').innerHTML = document.getElementById('hiddenElem').value;
        }
    </script>
</head>

<body>

    <div id="hiddenDiv" style="position: absolute; left: -1500px">
        <!--i want to find the value of this element right here-->
        <span id="hiddenElem"></span>
    </div>

    <span id="displayHere"></span>

    <script type="text/javascript">
        changeValue();
        printHidden();
    </script>

</body>

</html>

what i want to print is the value of element with id hiddenElem.
to do this i tried using urllib and beautifulsoup combo. the code i used is:

from BeautifulSoup import BeautifulSoup
import urllib2
import urllib

mysite = urllib.urlopen("http://localhost/hiddentry.html")
soup = BeautifulSoup(mysite)
print soup.prettify()
print '\n\n'

areUthere = soup.find(id="hiddenElem").find(text=True)
print areUthere

what i am getting as output though is None.
any ideas? is what i am trying to accomplish even possible?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

半葬歌 2024-10-29 06:26:05

beautifulsoup 解析从服务器获取的 html。如果您想查看生成的值,则需要在将字符串传递给 beautifulsoup 之前以某种方式执行页面上的嵌入式 JavaScript。运行 JavaScript 后,您将把修改后的 DOM html 传递给 beautifulsoup。

就浏览器模拟而言:

使用浏览器模拟,您应该能够拉取基本 HTML,运行浏览器模拟执行 JavaScript,然后将修改后的 DOM HTML 放入 beautifulsoup 中。

beautifulsoup parses the html that it gets from the server. If you want to see generated values, you need to somehow execute the embedded javascript on the page before passing the string to beautifulsoup. Once you run the javascript, you'll pass the modified DOM html to beautifulsoup.

As far as browser emulation:

Using browser emulation, you should be able to pull down the base HTML, run browser emulation to execute the javascript, and then take the modified DOM HTML and jam it into beautifulsoup.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文