Python lxml 包装元素

发布于 2024-11-08 02:22:53 字数 1309 浏览 2 评论 0原文

我想知道使用 lxml 和 Python 将一个元素与另一个元素包装的最简单方法是什么,例如,如果我有一个 html 片段:

<h1>The cool title</h1>
<p>Something Neat</p>
<table>
<tr>
<td>aaa</td>
<td>bbb</td>
</tr>
</table>
<p>The end of the snippet</p>

我想用这样的节元素包装表元素:

<h1>The cool title</h1>
<p>Something Neat</p>
<section>
<table>
<tr>
<td>aaa</td>
<td>bbb</td>
</tr>
</table>
</section>
<p>The end of the snippet</p>

我想做的另一件事是使用特定属性搜索 xml 文档中的 h1s,然后包装所有元素,直到元素中的下一个 h1 标记,例如:

<h1 class='neat'>Subject 1</h1>
<p>Here is a bunch of boring text</p>
<h2>Minor Heading</h2>
<p>Here is some more</p>
<h1 class='neat>Subject 2</h1>
<p>And Even More</p>

转换为:

<section>
<h1 class='neat'>Subject 1</h1>
<p>Here is a bunch of boring text</p>
<h2>Minor Heading</h2>
<p>Here is some more</p>
</section>
<section>
<h1 class='neat>Subject 2</h1>
<p>And Even More</p>
</section>

感谢所有帮助, 克里斯

I was wondering what the easiest way to wrap an element with another element using lxml and Python for example if I have a html snippet:

<h1>The cool title</h1>
<p>Something Neat</p>
<table>
<tr>
<td>aaa</td>
<td>bbb</td>
</tr>
</table>
<p>The end of the snippet</p>

And I want to wrap the table element with a section element like this:

<h1>The cool title</h1>
<p>Something Neat</p>
<section>
<table>
<tr>
<td>aaa</td>
<td>bbb</td>
</tr>
</table>
</section>
<p>The end of the snippet</p>

Another thing I would like to do is scour the xml document for h1s with a certain attribute and then wrap all of the elements until the next h1 tag in an element for example:

<h1 class='neat'>Subject 1</h1>
<p>Here is a bunch of boring text</p>
<h2>Minor Heading</h2>
<p>Here is some more</p>
<h1 class='neat>Subject 2</h1>
<p>And Even More</p>

Converted to:

<section>
<h1 class='neat'>Subject 1</h1>
<p>Here is a bunch of boring text</p>
<h2>Minor Heading</h2>
<p>Here is some more</p>
</section>
<section>
<h1 class='neat>Subject 2</h1>
<p>And Even More</p>
</section>

Thanks for all the help,
Chris

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

东走西顾 2024-11-15 02:22:53

lxml 对于解析格式良好的 xml 非常有用,但如果您有非 xhtml html,则不太好。如果是这种情况,那么按照系统化程序的建议使用 BeautifulSoup 。

使用 lxml,这是在文档中的所有表格周围插入一个部分的相当简单的方法:

import lxml.etree

TEST="<html><h1>...</html>"

def insert_section(root):
    tables = root.findall(".//table")
    for table in tables:
        section = ET.Element("section")
        table.addprevious(section)
        section.insert(0, table)   # this moves the table

root = ET.fromstring(TEST)
insert_section(root)
print ET.tostring(root)

您可以执行类似的操作来包装标题,但您需要迭代所有要包装的元素并将它们移动到该部分。 element.index(child) 和列表切片可能会有所帮助。

lxml's awesome for parsing well formed xml, but's not so good if you've got non-xhtml html. If that's the case then go for BeautifulSoup as suggested by systemizer.

With lxml, this is a fairly easy way to insert a section around all tables in the document:

import lxml.etree

TEST="<html><h1>...</html>"

def insert_section(root):
    tables = root.findall(".//table")
    for table in tables:
        section = ET.Element("section")
        table.addprevious(section)
        section.insert(0, table)   # this moves the table

root = ET.fromstring(TEST)
insert_section(root)
print ET.tostring(root)

You could do something similar to wrap the headings, but you would need to iterate through all the elements you want to wrap and move them to the section. element.index(child) and list slices might help here.

与他有关 2024-11-15 02:22:53

如果您正在解析某些 xml 文件,则可以使用 BeautifulSoup http://www.crummy.com/software /BeautifulSoup/

Beautiful Soup 是将 xml 表示为 python 对象的好方法。然后,您可以编写 python 对象来分析 html 并添加/删除标签。因此,您可以使用 is_h1 函数来查找 xml 文件中的所有标签。然后你可以使用 beautiful soup 添加一个部分标签。

如果您想将其返回给浏览器,您可以使用 HttpResponse,其参数是最终 xml 产品的字符串表示形式。

If you're parsing certain xml files, you can use BeautifulSoup http://www.crummy.com/software/BeautifulSoup/

Beautiful Soup is a great way to represent xml as a python object. You can then write python objects to analyze the html and add/remove tags. Therefore you can have a is_h1 function which will find all the tags in the xml file. Then you can add a section tag using beautiful soup.

If you would like to return this to the browser, you can use an HttpResponse with the argument being the string representation of the finished xml product.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文