当我使用Pyquery解析字符串时,为什么输出重复?

发布于 2025-01-18 11:03:28 字数 2397 浏览 4 评论 0原文

为什么在 Spyder 中使用 PyQuery 解析字符串时输出重复?

这是我的代码:

from pyquery import PyQuery as pq
html = """

    <ul>
        <li>first-item</li>
        <li><a href="link2.html">second item</a></li>
        <li><a href="link3.html">third item</a></li>
        <li><a href="link4.html">fourth item</a></li>
        <li><a href="link5.html">fifth item</a></li>        
    </ul>

"""
doc = pq(html)
print(type(doc))
print(doc('li'))

这是输出:

<class 'pyquery.pyquery.PyQuery'>
<a href="link2.html">second item</a></li>
        <li class="item=-0 active"><a href="link3.html"><span class="" bold="">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html><a href="link3.html"><span class="" bold="">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html>

但是,根据我的教科书,输出应该是

<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>

我已经尽力在互联网上找到问题的答案,但在论坛或Github上没有类似的问题。我希望你能帮助我,我将非常感激。

Why is the output repeated when I parse a string using PyQuery in Spyder?

Here is my code:

from pyquery import PyQuery as pq
html = """

    <ul>
        <li>first-item</li>
        <li><a href="link2.html">second item</a></li>
        <li><a href="link3.html">third item</a></li>
        <li><a href="link4.html">fourth item</a></li>
        <li><a href="link5.html">fifth item</a></li>        
    </ul>

"""
doc = pq(html)
print(type(doc))
print(doc('li'))

Here is the output:

<class 'pyquery.pyquery.PyQuery'>
<a href="link2.html">second item</a></li>
        <li class="item=-0 active"><a href="link3.html"><span class="" bold="">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html><a href="link3.html"><span class="" bold="">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html>

However, according to my textbook the output should be

<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>

I have tried very hard to find the answer to the problem on the Internet, but there is no similar problem on the forum or Github. I hope you can help me, I will be very grateful.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

冷弦 2025-01-25 11:03:28

您没有搜索正确的标签。
您想要拥有所有

  • 元素,因此您应该搜索 li,而不是 a
  • ,您将拥有:

    from pyquery import PyQuery as pq
    html = """
        <ul>
            <li>first-item</li>
            <li><a href="link2.html">second item</a></li>
            <li><a href="link3.html">third item</a></li>
            <li><a href="link4.html">fourth item</a></li>
            <li><a href="link5.html">fifth item</a></li>        
        </ul>
    """
    doc = pq(html)
    print(type(doc))
    print(doc('li'))
    

    因此 给我:

    <class 'pyquery.pyquery.PyQuery'>
    <li>first-item</li>
    <li><a href="link2.html">second item</a></li>
    <li><a href="link3.html">third item</a></li>
    <li><a href="link4.html">fourth item</a></li>
    <li><a href="link5.html">fifth item</a></li> 
    

    我独立于任何上下文进行测试,仅使用您提供的代码片段。如果应用此方法时仍然出现问题,则错误一定来自代码中的其他地方。

    You don't search the right tag.
    You want to have all the <li> elements, so you should search for li, not for a

    Thus, you would have :

    from pyquery import PyQuery as pq
    html = """
        <ul>
            <li>first-item</li>
            <li><a href="link2.html">second item</a></li>
            <li><a href="link3.html">third item</a></li>
            <li><a href="link4.html">fourth item</a></li>
            <li><a href="link5.html">fifth item</a></li>        
        </ul>
    """
    doc = pq(html)
    print(type(doc))
    print(doc('li'))
    

    This gives me :

    <class 'pyquery.pyquery.PyQuery'>
    <li>first-item</li>
    <li><a href="link2.html">second item</a></li>
    <li><a href="link3.html">third item</a></li>
    <li><a href="link4.html">fourth item</a></li>
    <li><a href="link5.html">fifth item</a></li> 
    

    I tested independantly of any context, just with the snippet you gave. If there is still something going wrong when applying this, the error must come from elsewhere in your code.

    ~没有更多了~
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文