CSV 解析器适用于 Windows,不适用于 Linux

发布于 2024-10-18 13:47:59 字数 1184 浏览 0 评论 0原文

我正在解析一个 CSV 文件,如下所示:

E1,E2,E7,E8,,,
E2,E1,E3,,,,
E3,E2,E8,,,
E4,E5,E8,E11,,,

我将每行中的第一个条目存储在字符串中,其余的存储在字符串向量中:

while (getline(file_input, line)) {
    stringstream tokenizer; 
    tokenizer << line;
    getline(tokenizer, roomID, ',');
    vector<string> aVector;
    while (getline(tokenizer, adjRoomID, ',')) {
        if (!adjRoomID.empty()) {
            aVector.push_back(adjRoomID);
        }
    }
    Room aRoom(roomID, aVector);
    rooms.addToTail(aRoom);
}

在 Windows 中,这工作正常,但是在 Linux 中,每个向量的第一个条目很神秘丢失第一个字符。例如,在 while 循环的第一次迭代中:

roomID 将为 E1aVector 将为 2 E7 E8

然后是第二次迭代: roomID 将为 E2aVector 将为 1 E3

请注意,缺少 E aVector 的第一个条目。

当我放入一些调试代码时,它似乎最初被正确存储在向量中,但随后有些东西覆盖了它。向解决这个问题的人表示敬意。我觉得很奇怪。

编辑: 谢谢埃里克。我终于明白了。在 Windows 上,所有行都以 \n 结尾。然而,当我切换到 Unix\Linux 时,这些行以 \r\n 结尾。因此,当 getline 读取一行时,它会将所有内容读取到字符串中,包括 \r。我没有考虑到这个\r,它把我搞砸了。问题不在于 E 缺失。这是因为我在向量中有一个额外的条目,其中有一个 \r 字符。我的其他类无法处理这个带有单个 \r 的条目。

I'm parsing a CSV file that looks like this:

E1,E2,E7,E8,,,
E2,E1,E3,,,,
E3,E2,E8,,,
E4,E5,E8,E11,,,

I store the first entry in each line in a string, and the rest go in a vector of strings:

while (getline(file_input, line)) {
    stringstream tokenizer; 
    tokenizer << line;
    getline(tokenizer, roomID, ',');
    vector<string> aVector;
    while (getline(tokenizer, adjRoomID, ',')) {
        if (!adjRoomID.empty()) {
            aVector.push_back(adjRoomID);
        }
    }
    Room aRoom(roomID, aVector);
    rooms.addToTail(aRoom);
}

In windows this works fine, however in Linux the first entry of each vector mysteriously loses the first character. For Example in the first iteration through the while loop:

roomID would be E1 and aVector would be 2 E7 E8

then the second iteration:
roomID would be E2 and aVector would be 1 E3

Notice the missing E's in the first entry of aVector.

when I put in some debugging code it appears that it is initially being stored correctly in the vector, but then something overwrites it. Kudos to whoever figures this one out. Seems bizarre to me.

EDIT:
thank you Erik. I finally understand. On windows all the lines just end with a \n. When I switch to Unix\Linux however, the lines end in \r\n. Thus, when getline reads a line it reads everything into the string including the \r. I was not accounting for this \r and it was screwing me up. The problem wasn't that the E was missing. It was that I had an extra entry in the vector with a single \r character in it. My other classes couldn't handle this entry with a single \r in it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

寄离 2024-10-25 13:47:59

哎呀:误读了您的问题,以为它是在谈论无法在Windows上运行。我将答案留在这里,以防有人偶然发现需要它的情况,但我认为在这种情况下它不会对您(提问者)有帮助。

如果您使用的是 MSVC6,则可能会遇到此错误 使用 getline 函数。链接中有一个修复。

对于后代,以下是链接中的信息:

症状:“标准 C++ 库模板
getline 函数读取额外内容
遇到后的性格
分隔符。请参考样本
更多信息中的程序
部分了解详细信息。”

修改getline成员函数,
可以在以下内容中找到
系统头文件字符串,如下:

else if (_Tr::eq((_E)_C, _D))
            {_Chg = true;
          //  _I.rdbuf()->snextc(); /* Remove this line and add the line below.*/ 
              _I.rdbuf()->sbumpc();
            break; }

注意:因为决议涉及
修改系统头文件,
应格外小心以确保
没有其他任何改变
头文件。微软不是
对由此产生的任何问题负责
避免对系统进行不必要的更改
头文件

Oops: misread your question, thought it was talking about not working on Windows. I'm leaving the answer here in case anyone stumbles upon this in need of it, but I don't think it will help you (the asker) in this case.

If you're on MSVC6, you could be encountering this bug with the getline function. There's a fix in the link.

For posterity, here's the info from the link:

SYMPTOM: "The Standard C++ Library template
getline function reads an extra
character after encountering the
delimiter. Please refer to the sample
program in the More Information
section for details."

Modify the getline member function,
which can be found in the following
system header file string, as follows:

else if (_Tr::eq((_E)_C, _D))
            {_Chg = true;
          //  _I.rdbuf()->snextc(); /* Remove this line and add the line below.*/ 
              _I.rdbuf()->sbumpc();
            break; }

Note: Because the resolution involves
modifying a system header file,
extreme care should be taken to ensure
that nothing else is changed in the
header file. Microsoft is not
responsible for any problems resulting
from unwanted changes to the system
header file

一口甜 2024-10-25 13:47:59

我怀疑 Windows 中的 \r \r\n 换行符可能会弄乱执行打印的代码。

如果改成这个if语句,问题就消失了吗?

if (!adjRoomID.empty() && (adjRoomID[0] != '\r'))

编辑:修正错字

I suspect that the \r in the windows \r\n linefeed could mess up the code doing your printing.

If you change to this if statement, does the problem disappear?

if (!adjRoomID.empty() && (adjRoomID[0] != '\r'))

EDIT: Fixed typo

向日葵 2024-10-25 13:47:59

尝试一些 cout 调试。打印出您读入的值:

if (!adjRoomID.empty()) {
    cout << '"' << adjRoomId << '"' << endl;
    aVector.push_back(adjRoomID);
}

这会告诉您字符串是否从一开始就被正确读取,并且还可能会告诉您是否正在从文件中读取额外的奇怪字符。

Try some cout debugging. Print out the values as you read them in:

if (!adjRoomID.empty()) {
    cout << '"' << adjRoomId << '"' << endl;
    aVector.push_back(adjRoomID);
}

That will tell you if your strings are being read correctly from the get-go, and will also probably tell you if you're reading in extra weird characters from the file.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文