使用 Python 从 HTML 生成目录
我正在尝试根据 HTML 块(不是完整的文件 - 只是内容)生成一个目录,该目录基于其
和
标签。
标签。到目前为止,我的计划是:
使用
beautifulsoup
提取标头列表在content 将锚链接放置在标题标签之前/内部(以便用户可以单击目录)--可能有一种方法可以替换
beautifulsoup
内部?输出指向预定义位置中标题的链接的嵌套列表。
当我这样说时,听起来很容易,但事实证明,这有点背后的痛苦。
有没有什么东西可以一次性为我完成所有这一切,这样我就不会浪费接下来的几个小时重新发明轮子?
一个例子:
<p>This is an introduction</p>
<h2>This is a sub-header</h2>
<p>...</p>
<h3>This is a sub-sub-header</h3>
<p>...</p>
<h2>This is a sub-header</h2>
<p>...</p>
I'm trying to generate a table of contents from a block of HTML (not a complete file - just content) based on its <h2>
and <h3>
tags.
My plan so far was to:
Extract a list of headers using
beautifulsoup
Use a regex on the content to place anchor links before/inside the header tags (so the user can click on the table of contents) -- There might be a method for replacing inside
beautifulsoup
?Output a nested list of links to the headers in a predefined spot.
It sounds easy when I say it like that, but it's proving to be a bit of a pain in the rear.
Is there something out there that does all this for me in one go so I don't waste the next couple of hours reinventing the wheel?
A example:
<p>This is an introduction</p>
<h2>This is a sub-header</h2>
<p>...</p>
<h3>This is a sub-sub-header</h3>
<p>...</p>
<h2>This is a sub-header</h2>
<p>...</p>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
有些人很快就破解了一段丑陋的代码:
Some quickly hacked ugly piece of code:
使用
lxml.html
。Use
lxml.html
.我提供了 Łukasz 提出的解决方案的扩展版本。
I have come with an extended version of the solution proposed by Łukasz's.
如何生成表格Python 中 HTML 文本的内容?
但我认为您走在正确的道路上,重新发明轮子会很有趣。
How do I generate a table of contents for HTML text in Python?
But I think you are on the right track and reinventing the wheel will be fun.