PCRECPP (PCRE) 从 url 代码中提取主机名问题
我有一段简单的 C++ 代码:
int main(void)
{
string text = "http://www.amazon.com";
string a,b,c,d,e,f;
pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)");
if(re.PartialMatch(text, &a,&b,&c,&d,&e,&f))
{
std::cout << "match: " << f << "\n";
// should print "www.amazon.com"
}else{
std::cout << "no match. \n";
}
return 0;
}
当我运行它时,它找不到匹配项。 我很确定正则表达式模式是正确的,而我的代码是错误的。 如果熟悉 pcrecpp 的人可以看看这个,我将不胜感激。
编辑: 感谢 Dingo,它工作得很好。
我遇到的另一个问题是结果是第六位 - “f”。
我编辑了上面的代码,因此您可以根据需要复制/粘贴。
I have this simple piece of code in c++:
int main(void)
{
string text = "http://www.amazon.com";
string a,b,c,d,e,f;
pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)");
if(re.PartialMatch(text, &a,&b,&c,&d,&e,&f))
{
std::cout << "match: " << f << "\n";
// should print "www.amazon.com"
}else{
std::cout << "no match. \n";
}
return 0;
}
When I run this it doesn't find a match.
I pretty sure that the regex pattern is correct and my code is what's wrong.
If anyone familiar with pcrecpp can take a look at this Ill be grateful.
EDIT:
Thanks to Dingo, it works great.
another issue I had is that the result was at the sixth place - "f".
I edited the code above so you can copy/paste if you wish.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
问题是您的代码包含
??(
,它是 C++ 中[
的三字母组合。您需要禁用三字母组合或执行某些操作来分解它们,例如:The problem is that your code contains
??(
which is a trigraph in C++ for[
. You'll either need to disable trigraphs or do something to break them up like:请这样做
计算<< re.pattern() <<结束;
仔细检查所有双斜杠是否正确完成(并发布结果)。
看起来像
^((\w+):///?)?((\w+):?(\w+)?@)?([^/\?:]+):?(\d+)?(/? [^\?#;\|]+)?([;\|])?([^\?#]+)?\??([^#]+)?#?(\w*)
主机名不会从第一个捕获组返回,为什么您要使用括号,例如您不想捕获的 \w+ ?
Please do
cout << re.pattern() << endl;
to double-check that all your double-slashing is done right (and also post the result).
Looks like
^((\w+):///?)?((\w+):?(\w+)?@)?([^/\?:]+):?(\d+)?(/?[^\?#;\|]+)?([;\|])?([^\?#]+)?\??([^#]+)?#?(\w*)
The hostname isn't going to be returned from the first capture group, why are you using parentheses around for example \w+ that you aren't wanting to capture?