查找锚链接中的空格

发布于 2024-08-01 13:45:27 字数 335 浏览 5 评论 0原文

我们有大量的静态 HTML 链接,例如,

<a href="link.html#glossary">Link</a>

但是其中一些在锚点中包含空格,例如关于

 <a href="link.html#this is the glossary">Link</a>

我需要使用哪种正则表达式来查找 # 后面的空格并将其替换为的任何想法a - 或 _

更新: 只需使用 TextMate 找到它们,因此不需要 HTML 解析库。

We've got a large amount of static that HTML has links like e.g.

<a href="link.html#glossary">Link</a>

However some of them contain spaces in the anchor e.g.

 <a href="link.html#this is the glossary">Link</a>

Any ideas on what kind of regular expression I'd need to use to find the Spaces after the # and replace them with a - or _

Update: Just need to find them using TextMate, hence no need for a HTML parsing lib.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

画中仙 2024-08-08 13:45:27

这个正则表达式应该可以做到:

#[a-zA-Z]+\s+[a-zA-Z\s]+

三个警告。

首先,如果您担心页面文本本身(而不仅仅是链接)可能包含诸如“#hashtag more Words”之类的信息,那么您可以使正则表达式更具限制性,如下所示:

#[a-zA-Z]+\s+[a-zA-Z\s]+\">

其次,如果您有包含以下内容的哈希标签AZ 之外的字符,然后将它们添加到第二组括号之间。 因此,如果您也有“-”,您将修改为:

#[a-zA-Z]+\s+[a-zA-Z-\s]+\">

最后,这假设您尝试匹配的所有链接均以字母/单词开头,后跟空格,因此,在当前形式中,它不会匹配“Anchor-tags-galore”,但会匹配“Anchor Tags galore”。

This regex should do it:

#[a-zA-Z]+\s+[a-zA-Z\s]+

Three Caveats.

First, if you are afraid that the page text itself (and not just the links) might contain information like "#hashtag more words", then you could make the regex more restrictive, like this:

#[a-zA-Z]+\s+[a-zA-Z\s]+\">

Second, if you have hash tags that contain characters beyond A-Z, then just add them in between the second set of brackets. So, if you have '-' as well, you would modify to:

#[a-zA-Z]+\s+[a-zA-Z-\s]+\">

Finally, this assumes that all the links you are trying to match start with a letter/word and are followed by a space, so, in the current form, it would not match "Anchor-tags-galore", but would match "Anchor tags galore."

绾颜 2024-08-08 13:45:27

您是否考虑过使用像 BeautifulSoup 这样的 HTML 解析库? 这将使查找所有 href 变得更加容易!

Have you considered using an HTML parsing library like BeautifulSoup? It would make finding all the hrefs much easier!

昨迟人 2024-08-08 13:45:27

在这里,此正则表达式匹配哈希以及之间的所有单词和空格:

#(\w+\s)+\w+

http://dl.getdropbox.com/u/5912/Jing/2009-08-12_1651.png

当你有时间的时候,你应该下载“The Regex Coach”,这是一个很棒的工具开发您自己的正则表达式。 你会得到即时反馈,并且学得很快。 而且它是免费的!

访问主页

Here, this regex matches the hash and all the words and spaces in between:

#(\w+\s)+\w+

http://dl.getdropbox.com/u/5912/Jing/2009-08-12_1651.png

When you have some time, you should download "The Regex Coach", which is an awesome tool to develop your own regexes. You get instant feedback and you learn very fast. Plus it comes at no cost!

Visit the homepage

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文