为什么我的 Boost.Regex 搜索仅报告一次匹配迭代?
我试图找出字符串中有多少个正则表达式匹配项。 我使用迭代器来迭代匹配项,并使用整数来记录有多少个匹配项。
long int before = GetTickCount();
string text;
boost::regex re("^(\\d{5})\\s(\\d{8})\\s(.*)\\s(.*)\\s(.*)\\s(\\d{8})\\s(.{1})$");
char * buffer;
long length;
long count;
ifstream f;
f.open("c:\\temp\\test.txt", ios::in | ios::ate);
length = f.tellg();
f.seekg(0, ios::beg);
buffer = new char[length];
f.read(buffer, length);
f.close();
text = buffer;
boost::sregex_token_iterator itr(text.begin(), text.end(), re, 0);
boost::sregex_token_iterator end;
count = 0;
for(; itr != end; ++itr)
{
count++;
}
long int after = GetTickCount();
cout << "Found " << count << " matches in " << (after-before) << " ms." << endl;
在我的示例中,count 始终返回 1,即使我将代码放入 for 循环中以显示匹配项(而且有很多匹配项)。 这是为什么? 我究竟做错了什么?
编辑
测试输入:
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
输出(不匹配):
在 16 毫秒内找到 1 个匹配项。
如果我将 for 循环更改为:
count = 0;
for(; itr != end; ++itr)
{
string match(itr->first, itr->second);
cout << match << endl;
count++;
}
我得到以下输出:
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
Found 1 matches in 47 ms.
I am trying to find out how many regex matches are in a string. I'm using an iterator to iterate the matches, and and integer to record how many there were.
long int before = GetTickCount();
string text;
boost::regex re("^(\\d{5})\\s(\\d{8})\\s(.*)\\s(.*)\\s(.*)\\s(\\d{8})\\s(.{1})$");
char * buffer;
long length;
long count;
ifstream f;
f.open("c:\\temp\\test.txt", ios::in | ios::ate);
length = f.tellg();
f.seekg(0, ios::beg);
buffer = new char[length];
f.read(buffer, length);
f.close();
text = buffer;
boost::sregex_token_iterator itr(text.begin(), text.end(), re, 0);
boost::sregex_token_iterator end;
count = 0;
for(; itr != end; ++itr)
{
count++;
}
long int after = GetTickCount();
cout << "Found " << count << " matches in " << (after-before) << " ms." << endl;
In my example, count always returns 1, even if I put code in the for loop to show the matches (and there are plenty). Why is that? What am I doing wrong?
Edit
TEST INPUT:
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
OUTPUT (without matches):
Found 1 matches in 16 ms.
If I change the for loop to this:
count = 0;
for(; itr != end; ++itr)
{
string match(itr->first, itr->second);
cout << match << endl;
count++;
}
I get this as output:
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
Found 1 matches in 47 ms.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
呵呵。 你的问题是你的正则表达式。 将您的
(.\*)
更改为(.\*?)
(假设受支持)。 您认为您看到每一行都被匹配,但实际上您看到整个文本被匹配,因为您的模式是贪婪的。要查看所说明的问题,请将循环中的调试输出更改为:
Heh. Your problem is your regex. Change your
(.\*)
s to(.\*?)
s (assuming that's supported). You think you're seeing each line being matched, but in fact you're seeing the entire text being matched because your pattern is greedy.To see the issue illustrated, change the debug output in your loop to:
对 boost 不太了解,但是 (end - itr) 有用吗?
Don't know much about boost, but does (end - itr) work?
既然你说即使输出结果,计数仍然是 1,你可能会看一些事情来帮助诊断它:
count
变量的情况下,您可能会遇到一些范围阴影。如果该循环执行多次,那么问题不在于您如何使用 boost。 无论您在做什么,boost 都无法修改您未传递给它的变量。 (当然,如果您要传递
count
来提升某个地方,那么这是另一种可能性。)很可能,您拥有的第一个
(.*)
会匹配所有内容,直到几乎输入结束(包括换行符)。 尝试用([^ ]*)
替换它们(除了空格之外的任何内容,因此当找到空格时匹配就会停止。Since you're saying that even when you output the results, the count is still one, you might look at a couple things to help diagnose it:
count
variable.If that loop is executing multiple times, then the problem is not in how you are using boost. No matter what you are doing, boost does not have the ability to modify a variable that you don't pass to it. (Of course if you are passing
count
in to boost somewhere, then that's another possiblity.)With all likelyhood, the first
(.*)
you have is matching everything up until nearly the end of the input (newlines included). Try replacing those with([^ ]*)
(anything but a space, so the matching stops when it finds a space.您可以粘贴输入和输出吗?
如果 count 返回 1,则意味着字符串
text
中只有 一个 匹配项。Can you paste the input and also the output.
If count returns 1, that means there is only one match in your string
text
.