需要 Qt 中正则表达式 (QRegExp) 的帮助 [错误的重复语法?]
void MainWindow::whatever(){
QRegExp rx ("<span(.*?)>");
//QString line = ui->txtNet1->toHtml();
QString line = "<span>Bar</span><span style='baz'>foo</span>";
while(line.contains(rx)){
qDebug()<<"Found rx!";
line.remove (rx);
}
}
我已经使用这个工具在线测试了正则表达式。使用给定的正则表达式字符串和 Bar
示例文本,该工具表示应在字符串中找到正则表达式。然而,在我的 Qt 代码中,我永远不会进入 while 循环。
我以前从未在 Qt 或任何其他语言中使用过正则表达式。有人可以提供一些帮助吗?谢谢!
[编辑] 所以我刚刚发现 QRegExp 有一个函数 errorString()
可以在正则表达式无效时使用。我输出这个并看到:“错误的重复语法”。不太确定这意味着什么。当然,谷歌搜索“糟糕的重复语法”会带来......这篇文章。该死的谷歌,你太快了。
void MainWindow::whatever(){
QRegExp rx ("<span(.*?)>");
//QString line = ui->txtNet1->toHtml();
QString line = "<span>Bar</span><span style='baz'>foo</span>";
while(line.contains(rx)){
qDebug()<<"Found rx!";
line.remove (rx);
}
}
I've tested the regular expression online using this tool. With the given regex string and a sample text of <span style="foo">Bar</span>
the tool says that it the regular expression should be found in the string. In my Qt code, however, I'm never getting into my while loop.
I've really never used regex before, in Qt or any other language. Can someone provide some help? Thanks!
[edit]
So I just found that QRegExp has a function errorString()
to use if the regex is invalid. I output this and see: "bad repetition syntax". Not really sure what this means. Of course, googling for "bad repetition syntax" brings up... this post. Damn google, you fast.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
问题是 QRegExp 仅支持贪婪量词。更准确地说,它支持任一 贪婪的或不情愿的量词,但不能同时使用两者。因此,
无效,因为没有
*?
运算符。相反,您可以使用This will Give every
*
、+
和?
在QRegExp< /code> 分别是
*?
、+?
和??
的行为,而不是它们的默认行为。您可能知道也可能不知道,区别在于最小版本尽可能匹配少数个字符,而不是许多个字符。在这种情况下,您也可以写
这可能是我会做的,因为它具有相同的效果:匹配直到您看到
>
。是的,你的更通用(如果你有一个多字符结束标记),但我认为这在简单的情况下稍微好一些。当然,两者都有效。另外,解析 HTML 时要非常非常小心使用正则表达式。你实际上无法做到这一点,而且识别标签——虽然(我相信)是可能的——比这要困难得多。 (注释、CDATA 块和处理指令会阻碍工作。)如果您知道正在查看的数据类型,这可能是一个可接受的解决方案;即便如此,我还是会转而研究 HTML 解析器。
The problem is that
QRegExp
only supports greedy quantifiers. More precisely, it supports either greedy or reluctant quantifiers, but not both. Thus,<span(.*?)>
is invalid, since there is no*?
operator. Instead, you can useThis will give every
*
,+
, and?
in theQRegExp
the behavior of*?
,+?
, and??
, respectively, rather than their default behavior. The difference, as you may or may not be aware, is that the minimal versions match as few characters as possible, rather than as many.In this case, you can also write
This is probably what I would do, since it has the same effect: match until you see a
>
. Yours is more general, yes (if you have a multi-character ending token), but I think this is slightly nicer in the simple case. Either will work, of course.Also, be very, very careful about parsing HTML with regular expressions. You can't actually do it, and recognizing tags is—while (I believe) possible—much harder than just this. (Comments, CDATA blocks, and processing instructions throw a wrench in the works.) If you know the sort of data you're looking at, this can be an acceptable solution; even so, I'd look into an HTML parser instead.
你想达到什么目的?如果您想删除开始标记及其元素,那么该模式
可能是最简单的。
语法 .*?表示广泛支持的非贪婪匹配,但可能会混淆 QT 正则表达式引擎。
What are you trying to achieve? If you want to remove the opening tag and its elements, then the pattern
is probably the simplest.
The syntax .*? means non-greedy match which is widely supported, but may be confusing the QT regex engine.