将所有 HTML 列表项转换为可点击的超链接
我收到了一个需要重新格式化的 HTML 列表。 120 个问题和答案(总共 240 个标签集)。我需要做的就是使标签之间的文本成为链接,如下所示:
<li>do snails make your feet itch?</li>
必须成为
<li><a href="#n">do snails make your feet itch?</a></li>
我在 IDE 中执行此替换。
最终我可能会尝试使用 Perl 进行批量替换以插入 'n'
变量,以便链接正确指向。
您可能会问:“如果您可以使用 Perl 来实现这一点,为什么不使用整个 shebang?” ...这是一个有效的问题,但我想更多地使用正则表达式,因为它对于像这样的大列表具有强大的功能。另外,我的 Perl 技能充其量只是粗略的(如果有 Perl 的建议,我很欢迎)。
I've been given an HTML list to reformat. 120 questions and answers (240 tag sets total). All I need to do is make the text between the tags a link, like so:
<li>do snails make your feet itch?</li>
has to become
<li><a href="#n">do snails make your feet itch?</a></li>
I am performing this replacement in my IDE.
Eventually I'll likely try to do a batch replace with Perl to insert the 'n'
variable so the links point properly.
You might ask: "if you can use Perl for that, why not the whole shebang?" ...that's a valid question, but I want to use regex more for the power it has for big lists like this. Plus my Perl skills are sketchy at best (I'd welcome Perl advice if it's on offer).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以分两步完成。
替换为
替换为 < code>
或者你可以尝试变得聪明一点,把它合而为一。这是 Perl 语法中的替代命令($1 引用括号中匹配的内容)。
当您在那里时,可以很容易地将替换模式的第二部分替换为将递增
n
的表达式。看看如何从命令行运行它:
You can do it in two steps.
<li>
with<li><a href="#n">
</li>
with</a></li>
Or you can try to be clever and it it in one. Here is a substitute command in Perl syntax ($1 references what was matched in the brackets).
And while you are there it's easy to replace the second part of the replacement pattern with an expression that will increment
n
See how you can run this from the command line:
搜索
替换
Search
Replace
解决这类问题的诀窍是改变你想改变的东西,而不是改变你不想改变的东西。这些问题看起来很简单,但通常看起来像是一系列您没有考虑到的情况,尤其是当您使用正则表达式时。
这是一个简短的程序,它使用了一段 HTML 代码,其中的列表项中有各种奇怪的情况。其中一个项目未以
结尾,其他项目的列表项目分布在几行中,并且有一条注释包含与正则表达式解决方案匹配的列表项目 HTML,并且不应该。请注意,匹配注释中的列表项将使 HTML 看起来像是跳过了一个数字:
Mojo:: DOM 只是几个知道如何处理 HTML 的 Perl 模块之一。我认为这是最简单的解决方案,但不是唯一的解决方案。其他模块将具有与此类似的结构。
构造
$dom
后,我使用find
来定位所有列表项标记。这不会在注释中找到任何内容,并且它不关心空格或多行。find 返回节点集合,each 可以在每个节点上运行一些代码。
each
中的代码引用可以改变节点。在本例中,我获取每个节点的content
,然后创建一个新的a
标签来包装该内容。现在列表项中有一个锚点。还有其他方法可以达到相同的结果,您可以根据问题的其余情况进行调整。该代码参考可以像您想要的那样复杂,但关键是您知道您正在开始。
如果您需要以不同的方式处理每个列表,您可以找到每个
ul
或ol
并单独处理它们。最后,我输出 DOM,即使在特殊情况下,您也会看到变化。请注意,第一个列表项以结束
结尾,多行元素已正确更改,并且注释已正确忽略:
这个东西需要一分钟才能弄清楚,但是一旦你学会了这样做,许多 HTML 修改任务就会变得容易得多。我在 Mojo Web Clients 中有很多示例,因为使用 HTML 是以编程方式与网络。
The trick in these sorts of problems is to change what you want to change and not to change things that you don't. These problems seem simple, but often it seems like it's an unending parade of cases you did not think about, especially when you use regular expressions.
Here's a short program that uses a snippet of HTML that has various weird cases in the list items. One of the items is not terminated with
</li>
, others have the list item spread out over a couple lines, and there is a comment that contains list item HTML that would match the regex solutions and shouldn't. Note that matching the list item in the comment would make the HTML look like it skipped a number:Mojo::DOM is just one of several Perl modules that knows how to deal with HTML. I think it's the simplest one to use, but it's not the only solution. Other modules will have a similar structure to this.
After I construct
$dom
, I usefind
to locate all of the list item tags. This isn't going to find anything in the comments, and it doesn't care about whitespace or multiple lines.The
find
returns a collection of nodes, and theeach
can run some code on every one of those. The code ref ineach
can mutate the node. In this case, I get thecontent
for each node, then create a newa
tag that wraps that content. Now the list item has an anchor in it.There are other ways to get to that same result, and you can adjust this based on the rest of the context of your problem. That code ref can be as complicated as you want, but the key point is that you know that you are starting with.
If you needed to treat each list differently, you could find each
ul
orol
and handle those separately.At the end I output the DOM and you see the changes even with the special cases. Notice that the first list item ends up with the closing
</li>
, the multi-line elements are correctly changed, and the comment is correctly ignored:This stuff takes a minute to figure out, but once you learn to do it, many of the HTML-mutating tasks become much easier. I have a lot of examples in Mojo Web Clients since playing with the HTML is a big part of programmatically interacting with the web.