CSV 解析器适用于 Windows,不适用于 Linux
我正在解析一个 CSV 文件,如下所示:
E1,E2,E7,E8,,,
E2,E1,E3,,,,
E3,E2,E8,,,
E4,E5,E8,E11,,,
我将每行中的第一个条目存储在字符串中,其余的存储在字符串向量中:
while (getline(file_input, line)) {
stringstream tokenizer;
tokenizer << line;
getline(tokenizer, roomID, ',');
vector<string> aVector;
while (getline(tokenizer, adjRoomID, ',')) {
if (!adjRoomID.empty()) {
aVector.push_back(adjRoomID);
}
}
Room aRoom(roomID, aVector);
rooms.addToTail(aRoom);
}
在 Windows 中,这工作正常,但是在 Linux 中,每个向量的第一个条目很神秘丢失第一个字符。例如,在 while 循环的第一次迭代中:
roomID
将为 E1
,aVector
将为 2
E7
E8
然后是第二次迭代: roomID
将为 E2
,aVector
将为 1
E3
请注意,缺少 E aVector 的第一个条目。
当我放入一些调试代码时,它似乎最初被正确存储在向量中,但随后有些东西覆盖了它。向解决这个问题的人表示敬意。我觉得很奇怪。
编辑: 谢谢埃里克。我终于明白了。在 Windows 上,所有行都以 \n 结尾。然而,当我切换到 Unix\Linux 时,这些行以 \r\n 结尾。因此,当 getline 读取一行时,它会将所有内容读取到字符串中,包括 \r。我没有考虑到这个\r,它把我搞砸了。问题不在于 E 缺失。这是因为我在向量中有一个额外的条目,其中有一个 \r 字符。我的其他类无法处理这个带有单个 \r 的条目。
I'm parsing a CSV file that looks like this:
E1,E2,E7,E8,,,
E2,E1,E3,,,,
E3,E2,E8,,,
E4,E5,E8,E11,,,
I store the first entry in each line in a string, and the rest go in a vector of strings:
while (getline(file_input, line)) {
stringstream tokenizer;
tokenizer << line;
getline(tokenizer, roomID, ',');
vector<string> aVector;
while (getline(tokenizer, adjRoomID, ',')) {
if (!adjRoomID.empty()) {
aVector.push_back(adjRoomID);
}
}
Room aRoom(roomID, aVector);
rooms.addToTail(aRoom);
}
In windows this works fine, however in Linux the first entry of each vector mysteriously loses the first character. For Example in the first iteration through the while loop:
roomID
would be E1
and aVector
would be 2
E7
E8
then the second iteration:roomID
would be E2
and aVector
would be 1
E3
Notice the missing E's in the first entry of aVector.
when I put in some debugging code it appears that it is initially being stored correctly in the vector, but then something overwrites it. Kudos to whoever figures this one out. Seems bizarre to me.
EDIT:
thank you Erik. I finally understand. On windows all the lines just end with a \n. When I switch to Unix\Linux however, the lines end in \r\n. Thus, when getline
reads a line it reads everything into the string including the \r. I was not accounting for this \r and it was screwing me up. The problem wasn't that the E was missing. It was that I had an extra entry in the vector with a single \r character in it. My other classes couldn't handle this entry with a single \r in it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
哎呀:误读了您的问题,以为它是在谈论无法在Windows上运行。我将答案留在这里,以防有人偶然发现需要它的情况,但我认为在这种情况下它不会对您(提问者)有帮助。
如果您使用的是 MSVC6,则可能会遇到此错误 使用 getline 函数。链接中有一个修复。
对于后代,以下是链接中的信息:
Oops: misread your question, thought it was talking about not working on Windows. I'm leaving the answer here in case anyone stumbles upon this in need of it, but I don't think it will help you (the asker) in this case.
If you're on MSVC6, you could be encountering this bug with the getline function. There's a fix in the link.
For posterity, here's the info from the link:
我怀疑 Windows 中的 \r \r\n 换行符可能会弄乱执行打印的代码。
如果改成这个if语句,问题就消失了吗?
编辑:修正错字
I suspect that the \r in the windows \r\n linefeed could mess up the code doing your printing.
If you change to this if statement, does the problem disappear?
EDIT: Fixed typo
尝试一些 cout 调试。打印出您读入的值:
这会告诉您字符串是否从一开始就被正确读取,并且还可能会告诉您是否正在从文件中读取额外的奇怪字符。
Try some cout debugging. Print out the values as you read them in:
That will tell you if your strings are being read correctly from the get-go, and will also probably tell you if you're reading in extra weird characters from the file.