匹配链接 url 的正则表达式中存在语法错误

发布于 2024-07-14 11:40:37 字数 1078 浏览 11 评论 0原文

我在一些 nemerle 代码中有以下方法:

private static getLinks(text : string) : array[string] {
        def linkrx = Regex(@"<a\shref=['|\"](.*?)['|\"].*?>");
        def m = linkrx.Matches(text);
        mutable txmatches : array[string];
        for (mutable i = 0; i < m.Count; ++i) {
            txmatches[i] = m[i].Value;
        }
        txmatches
    }

问题是编译器由于某种原因试图解析正则表达式语句内的括号,导致程序无法编译。 如果我删除@,(我被告知放在那里)我在“\s”上收到无效转义字符错误

这是编译器输出:(

NCrawler.n:23:21:23:22: ←[01;31merror←[0m: when parsing this `(' brace group
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'
NCrawler.n:22:57:22:58: ←[01;31merror←[0m: when parsing this `{' brace group
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'
NCrawler.n:8:1:8:2: ←[01;31merror←[0m: when parsing this `{' brace group
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'

第23行是带有正则表达式代码的行)

我应该做什么?

I have the following method in some nemerle code:

private static getLinks(text : string) : array[string] {
        def linkrx = Regex(@"<a\shref=['|\"](.*?)['|\"].*?>");
        def m = linkrx.Matches(text);
        mutable txmatches : array[string];
        for (mutable i = 0; i < m.Count; ++i) {
            txmatches[i] = m[i].Value;
        }
        txmatches
    }

the problem is that the compiler for some reason is trying to parse the brackets inside the regex statement and its causing the program to not compile. If i remove the @, (which i was told to put there) i get an invalid escape character error on the "\s"

Heres the compiler output:

NCrawler.n:23:21:23:22: ←[01;31merror←[0m: when parsing this `(' brace group
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'
NCrawler.n:22:57:22:58: ←[01;31merror←[0m: when parsing this `{' brace group
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'
NCrawler.n:8:1:8:2: ←[01;31merror←[0m: when parsing this `{' brace group
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'

(line 23 is the line with the regex code on it)

What should I do?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

爱本泡沫多脆弱 2024-07-21 11:40:37

我不知道 Nemerle,但似乎使用 @ 会禁用所有转义,包括 " 的转义。

尝试以下之一:

def linkrx = Regex("<a\\shref=['\"](.*?)['\"].*?>");

def linkrx = Regex(@"<a\shref=['""](.*?)['""].*?>");

def linkrx = Regex(@"<a\shref=['\x22](.*?)['\x22].*?>");

I don't know Nemerle, but it seems like using @ disables all escapes, including the escape for the ".

Try one of these:

def linkrx = Regex("<a\\shref=['\"](.*?)['\"].*?>");

def linkrx = Regex(@"<a\shref=['""](.*?)['""].*?>");

def linkrx = Regex(@"<a\shref=['\x22](.*?)['\x22].*?>");
情丝乱 2024-07-21 11:40:37

我不是 Nemerle 程序员,但我知道您应该始终使用 XML 解析器来处理基于 XML 的数据,而不是正则表达式。

我猜有人已经为 Nemerle 创建了 DOM 或 XPath 库,因此您可以

通过 XPath 访问 //a[@href] 或通过 DOM 访问类似 a.href.value 的内容。

当前的正则表达式不喜欢,例如

<a class="foo" href="something">bar</a>

我没有测试这个,但它应该更像它

/<a\s.+?href=['|\"]([^'\">]+)['|\"].+?>/i

I'm not Nemerle programmer but i know that yous shoud ALWAYS use XML parser for XML based data and not regexps.

I guess someone has created DOM or XPath library for Nemerle so you can access either

//a[@href] via XPath or something like a.href.value via DOM.

That current regexp doesn't like for example

<a class="foo" href="something">bar</a>

I didn't test this but it should be more like it

/<a\s.+?href=['|\"]([^'\">]+)['|\"].+?>/i
习惯成性 2024-07-21 11:40:37

问题出在引号上,而不是括号上。 在 Nemerle 中,就像在 C# 中一样,您可以使用另一个引号(而不是反斜杠)对引号进行转义。

@"<a\shref=['""](.*?)['""].*?>"

编辑:还要注意,您不需要方括号内的管道; 内容被视为一组字符(或字符范围),并隐含 OR。

The problem is with the quotation marks, not the brackets. In Nemerle, as in C#, you escape a quotation mark with another quotation mark, not a backslash.

@"<a\shref=['""](.*?)['""].*?>"

EDIT: Note as well that you don't need the pipe inside the square brackets; the contents are treated as a set of characters (or ranges of characters), with the OR being implied.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文