从国外来源设置 boost 正则表达式

发布于 2024-12-16 12:31:25 字数 1300 浏览 1 评论 0原文

我需要解析日志并且我有很好的正则表达式,但现在我需要从配置文件设置正则表达式,这是问题。

int logParser()
{
  std::string bd_regex; // this reads from config in other part of program
  boost::regex parsReg;
  //("(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])");
  try
  {
    parsReg.assign(bd_regex, boost::regex_constants::icase);  
  }
  catch (boost::regex_error& e)
  {
    cout << bd_regex << " is not a valid regular expression: \""
         << e.what() << "\"" << endl;
  }

  cout << parsReg << endl;
  // here it looks exactly like:
  // "("(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])");"

  int count=0;
  ifstream in;

  in.open(bd_log_path.c_str());

  while (!in.eof()) 
  {
    in.getline(buf, BUFSIZE-1);
    std::string s = buf;
    boost::smatch m;

    if (boost::regex_search(s, m, parsReg)) // it doesn't obey this "if"
    {
      std::string name, diagnosis;
      name.assign(m[2]);
      diagnosis.assign(m[4]);

      strcpy(bd_scan_results[count].file_name, name.c_str());
      strcpy(bd_scan_results[count].out,  diagnosis.c_str());
      strcat(bd_scan_results[count].out,  " ");

      count++;
      } 
    }
  return count;
}

我真的不知道为什么当我尝试从配置变量设置它时相同的正则表达式不起作用。

任何帮助将不胜感激(:

I need to parse log and I`ve good working regex, but now I need to set regex from config file and here is problem.

int logParser()
{
  std::string bd_regex; // this reads from config in other part of program
  boost::regex parsReg;
  //("(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])");
  try
  {
    parsReg.assign(bd_regex, boost::regex_constants::icase);  
  }
  catch (boost::regex_error& e)
  {
    cout << bd_regex << " is not a valid regular expression: \""
         << e.what() << "\"" << endl;
  }

  cout << parsReg << endl;
  // here it looks exactly like:
  // "("(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])");"

  int count=0;
  ifstream in;

  in.open(bd_log_path.c_str());

  while (!in.eof()) 
  {
    in.getline(buf, BUFSIZE-1);
    std::string s = buf;
    boost::smatch m;

    if (boost::regex_search(s, m, parsReg)) // it doesn't obey this "if"
    {
      std::string name, diagnosis;
      name.assign(m[2]);
      diagnosis.assign(m[4]);

      strcpy(bd_scan_results[count].file_name, name.c_str());
      strcpy(bd_scan_results[count].out,  diagnosis.c_str());
      strcat(bd_scan_results[count].out,  " ");

      count++;
      } 
    }
  return count;
}

and I really dont know why the same regex dont work when I tryed to set it from config variable.

Any help will be appreciated (:

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

伊面 2024-12-23 12:31:25

关于您的直接问题:尝试在配置文件中存储不带转义的正则表达式

(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])

此外,我必须说,您似乎想在此处匹配反斜杠:

C:.tmp.bd.

在配置中,写入:

C:\\tmp\\bd\\

在 C++ 字符串文字中那将是

"C:\\\\tmp\\\\bd\\\\"

On your direct question: Try storing the regex without escapes in the config file

(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])

Besides, I must say, that it looks like you wanted to match backslashes here:

C:.tmp.bd.

In the config, write:

C:\\tmp\\bd\\

In a C++ string literal that would be

"C:\\\\tmp\\\\bd\\\\"

青春如此纠结 2024-12-23 12:31:25

@sehe给出了正确的答案。

如果这行代码被c++解析器解析,
str = "(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\\.)+[a-zA-Z]{2,4})+( .+[a-zA-Z0-9_])";

它将转义字符 \\ 转义为转义符:\,然后
将其分配给变量“str”。在变量“str”内部,现在看起来像这样:
<代码>(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\.)+[a-zA-Z]{2,4})+(.+[a -zA-Z0-9_])

但是,您正在从文件中读取此文本,没有语言意义上的解析。
您正在分配给“str”,即一行原始文本。未经 C++ 解析器预处理的行。

@sehe gives the correct answer.

If this line of code were parsed by the c++ parser,
str = "(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])";

it would unescape the escape character \\ into just an escape: \, then
asign it to variable 'str'. Inside of the variable 'str', it now looks like this:
(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])

But, you are reading this text from a file, there is no parsing in a language sense.
You are asigning to 'str', a raw line of text. A line that is not pre-processed by the c++ parser.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文