我需要 Perl 正则表达式来解析纯文本输入并将所有链接转换为有效的 HTML HREF 链接。我已经尝试了在网上找到的 10 个不同版本,但没有一个可以正常工作。我还测试了 StackOverflow 上发布的其他解决方案,但似乎都不起作用。正确的解决方案应该能够在纯文本输入中找到任何 URL 并将其转换为:
<a href="$1">$1</a>
在某些情况下,我尝试过的其他正则表达式无法正确处理,包括:
- 行尾的 URL,后跟
- 包含问题的返回 URL标记
- 以“https”开头的 URL
我希望另一个 Perl 人员已经有一个他们正在使用的正则表达式,可以共享。预先感谢您的帮助!
I need the Perl regex to parse plain text input and convert all links to valid HTML HREF links. I've tried 10 different versions I found on the web but none of them seen to work correctly. I also tested other solutions posted on StackOverflow, none of which seem to work. The correct solution should be able to find any URL in the plain text input and convert it to:
<a href="$1">$1</a>
Some cases other regular expressions I tried didn't handle correctly include:
- URLs at the end of a line which are followed by returns
- URLs that included question marks
- URLs that start with 'https'
I'm hoping that another Perl guy out there will already have a regular expression they are using for this that they can share. Thanks in advance for your help!
发布评论
评论(4)
您需要 URI::Find。提取链接后,您应该能够很好地处理其余问题。
perlfaq9 对 "如何提取 URL?",顺便说一句。这些 perlfaq 中有很多好东西。 :)
You want URI::Find. Once you extract the links, you should be able to handle the rest of the problem just fine.
This is answered in perlfaq9's answer to "How do I extract URLs?", by the way. There is a lot of good stuff in those perlfaq. :)
除了
URI::Find
之外,还可以查看大型正则表达式数据库:Regexp::Common
,其中有一个 Regexp::Common::URI 模块可以为您提供简单的内容:如果您想要该 uri 中的不同部分(主机名、查询参数等),请参阅 Regexp::Common::URI::http 用于在
$RE{URI}
正则表达式中捕获的内容。Besides
URI::Find
, also checkout the big regular expression database:Regexp::Common
, there is a Regexp::Common::URI module that gives you something as easy as:If you want different pieces (hostname, query parameters etc) in that uri, see the doc of Regexp::Common::URI::http for what's captured in the
$RE{URI}
regular expression.当我尝试使用以下文本 URI::Find::Schemeless 时:
它搞砸了
http://example.org/(9.3)
。因此,我在 Regexp::Common 的帮助下想出了以下内容:这适用于所示的输入。当然,生活从来没有像您通过尝试
(http://example.org/(9.3))
看到的那么容易。When I tried URI::Find::Schemeless with the following text:
it messed up
http://example.org/(9.3)
. So, I came up with the following with the help of Regexp::Common:This worked for the input shown. Of course, life is never that easy as you can see by trying
(http://example.org/(9.3))
.在这里我发布了如何提取 url 的示例代码。
这里它将从标准输入中获取行。
并且它会检查输入行是否包含有效的 URL 格式。
它会给你
我得到的 URL 示例输出如下
Here I have posted the sample code using how to extract the urls.
Here it will take the lines from the stdin.
And it will check whether the input line contains valid URL format.
And it will give you the URL
Sample output I am getting is as follows