格式不匹配时停止读取

发布于 2024-10-28 18:40:03 字数 516 浏览 6 评论 0原文

我有一个文件,例如:

 1.0000000e+01   8.0123000e+01   1.0000000e+01   1.0000000e+01   1.0000000e+01
-1.0000000e+01   1.0000000e+01   1.0001110e+01   1.0000000e+01   1.0000000e+01
 1.0000000e+01   1.0000000e+01  -5.0000000e+01   1.0000000e+01   1.0000000e+01
 //... (repeated scientific numbers)
 1 2 3 4
 2 4 5 60
 100 3 5 63
 //... (repeated integer numbers)

我想从 C++ 文件中读取这些数字,但仅限于科学格式的数字,因此我需要代码在数字格式更改时停止。我还有一个优点,即浮点数有 5 列,而整数有 4 列。

那么,在 C++ 中做到这一点的最佳方法是什么?

I have a file such as:

 1.0000000e+01   8.0123000e+01   1.0000000e+01   1.0000000e+01   1.0000000e+01
-1.0000000e+01   1.0000000e+01   1.0001110e+01   1.0000000e+01   1.0000000e+01
 1.0000000e+01   1.0000000e+01  -5.0000000e+01   1.0000000e+01   1.0000000e+01
 //... (repeated scientific numbers)
 1 2 3 4
 2 4 5 60
 100 3 5 63
 //... (repeated integer numbers)

I would like to read these numbers from a file in C++, but only the numbers which are in scientific format, so I need the code to stop when the number format changes. I also have this advantage that the float numbers come in 5 columns, whereas integers come in 4 columns.

So, what's the best way to do that in C++?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

爱,才寂寞 2024-11-04 18:40:03

忽略 EOL(继续读取整数):

typedef double d[5] Datum;
Datum d;
vector<Datum> data;
while (true) {
  Datum t;
  istr >> t[0] >> t[1] >> t[2] >> t[3] >> t[4];
  if (!istr) break;
  data.push_back(t);
}

使用列计数和 EOL:

while (istr) {
  string line;
  getline(istr, line);
  Datum t;
  istringstream temp(line);
  temp >> t[0] >> t[1] >> t[2] >> t[3] >> t[4];
  if (temp.fail()) break;
  data.push_back(t);
}

Ignoring EOL (continues reading integers):

typedef double d[5] Datum;
Datum d;
vector<Datum> data;
while (true) {
  Datum t;
  istr >> t[0] >> t[1] >> t[2] >> t[3] >> t[4];
  if (!istr) break;
  data.push_back(t);
}

Using column count and EOL:

while (istr) {
  string line;
  getline(istr, line);
  Datum t;
  istringstream temp(line);
  temp >> t[0] >> t[1] >> t[2] >> t[3] >> t[4];
  if (temp.fail()) break;
  data.push_back(t);
}
风蛊 2024-11-04 18:40:03

您可以使用 strstr 在每一行中搜索“e+”。

http://www.cplusplus.com/reference/clibrary/cstring/strstr/

如果您想要更奇特,您可以使用正则表达式库(例如 boost::regex),它也可以帮助您从每一行中提取字符串。

You could use strstr to search for "e+" in each line.

http://www.cplusplus.com/reference/clibrary/cstring/strstr/

If you wanted to be more fancy you use a regular expression library (such as boost::regex) which would also help you extract the strings from each line.

寄人书 2024-11-04 18:40:03

恐怕没有直接的方法可以做到这一点。也就是说,您无法以特定格式流式传输 ( >> ) 浮点数。因此,如果您需要该功能,则必须将这些行作为字符串读取,然后手动解析它们。当然,这并不意味着您必须逐位构建浮点数。一旦确定了要从中读取浮点数的输入文件的边界,请使用 stringstreams 来读取它们。

I am afraid there is no direct way to do this. That is you can't stream in ( >> ) a float number in a specific format. So if you need that functionality, you must read the lines as strings, then manually parse them. Of course, this doesn't mean you have to build a float number digit by digit. Once you've established the boundaries of the input file from which you want to read float, use stringstreams to read them.

纵性 2024-11-04 18:40:03

您可以使用正则表达式仅匹配您关心的内容: -?\d+\.\d+e[+-]\d​​+

我确信这不是最好的方法,但如果性能不是一个大问题,这是一个简单的解决方法

警告:从 RegexBuddy 自动生成的代码

pcre *myregexp;
const char *error;
int erroroffset;
int offsetcount;
int offsets[(0+1)*3]; // (max_capturing_groups+1)*3
myregexp = pcre_compile("-?\\d+\\.\\d+e[+-]\\d+", 0, &error, &erroroffset, NULL);
if (myregexp != NULL) {
    offsetcount = pcre_exec(myregexp, NULL, subject, strlen(subject), 0, 0, offsets, (0+1)*3);
    while (offsetcount > 0) {
        // match offset = offsets[0];
        // match length = offsets[1] - offsets[0];
        if (pcre_get_substring(subject, &offsets, offsetcount, 0, &result) >= 0) {
            // Do something with match we just stored into result
        }
        offsetcount = pcre_exec(myregexp, NULL, subject, strlen(subject), 0, offsets[1], offsets, (0+1)*3);
    } 
} else {
    // Syntax error in the regular expression at erroroffset
}

You can use a regex to match only the ones you care : -?\d+\.\d+e[+-]\d+

I'm sure that this is not the best way but if performance is not a big issue it's an easy way out

Warning : Auto generated code from RegexBuddy

pcre *myregexp;
const char *error;
int erroroffset;
int offsetcount;
int offsets[(0+1)*3]; // (max_capturing_groups+1)*3
myregexp = pcre_compile("-?\\d+\\.\\d+e[+-]\\d+", 0, &error, &erroroffset, NULL);
if (myregexp != NULL) {
    offsetcount = pcre_exec(myregexp, NULL, subject, strlen(subject), 0, 0, offsets, (0+1)*3);
    while (offsetcount > 0) {
        // match offset = offsets[0];
        // match length = offsets[1] - offsets[0];
        if (pcre_get_substring(subject, &offsets, offsetcount, 0, &result) >= 0) {
            // Do something with match we just stored into result
        }
        offsetcount = pcre_exec(myregexp, NULL, subject, strlen(subject), 0, offsets[1], offsets, (0+1)*3);
    } 
} else {
    // Syntax error in the regular expression at erroroffset
}
享受孤独 2024-11-04 18:40:03

正则表达式是最好的方法,你也可以尝试使用 fscanf()

regex is best way to do that here is alternatively you can try with fscanf()

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文