解析CSV时出现ZWNBSP
我有一个CSV,我想检查它是否具有应有的所有数据。但是看起来ZWNBSP出现在第一个字符串中的第一列名的开头。
我简化的代码是
@Test
void parseCsvTest() throws Exception {
Configuration.holdBrowserOpen = true;
ClassLoader classLoader = getClass().getClassLoader();
try (
InputStream inputStream = classLoader.getResourceAsStream("files/csv_example.csv");
CSVReader reader = new CSVReader(new InputStreamReader(inputStream))
) {
List<String[]> content = reader.readAll();
var csvStrings0line = content.get(0);
var csv1stElement = csvStrings0line[0];
var csv1stElementShouldBe = "Timestamp";
assertEquals(csv1stElementShouldBe,csv1stElement);
我的CSV包含
"Timestamp","Source","EventName","CountryId","Platform","AppVersion","DeviceType","OsVersion"
"2022-05-02T14:56:59.536987Z","courierapp","order_delivered_sent","643","ios","3.11.0","iPhone 11","15.4.1"
"2022-05-02T14:57:35.849328Z","courierapp","order_delivered_sent","643","ios","3.11.0","iPhone 8","15.3.1"
我的测试失败,
expected: <Timestamp> but was: <Timestamp>
Expected :Timestamp
Actual :Timestamp
<Click to see difference>
单击“查看”差异显示,即实际文本的开头有一个zwnbsp。
复制我的文本到在线工具中显示非打印Unicode字符 https://www.soscisurvey.de/tools/view-chars.php 仅在线的末端显示Cr LF,没有ZWNBSPS。
但是它来自哪里?
I have a CSV and I want to check if it has all the data it should have. But it looks like ZWNBSP appears at the beginning of the 1st column name in the 1st string.
My simplified code is
@Test
void parseCsvTest() throws Exception {
Configuration.holdBrowserOpen = true;
ClassLoader classLoader = getClass().getClassLoader();
try (
InputStream inputStream = classLoader.getResourceAsStream("files/csv_example.csv");
CSVReader reader = new CSVReader(new InputStreamReader(inputStream))
) {
List<String[]> content = reader.readAll();
var csvStrings0line = content.get(0);
var csv1stElement = csvStrings0line[0];
var csv1stElementShouldBe = "Timestamp";
assertEquals(csv1stElementShouldBe,csv1stElement);
My CSV contains
"Timestamp","Source","EventName","CountryId","Platform","AppVersion","DeviceType","OsVersion"
"2022-05-02T14:56:59.536987Z","courierapp","order_delivered_sent","643","ios","3.11.0","iPhone 11","15.4.1"
"2022-05-02T14:57:35.849328Z","courierapp","order_delivered_sent","643","ios","3.11.0","iPhone 8","15.3.1"
My test fails with
expected: <Timestamp> but was: <Timestamp>
Expected :Timestamp
Actual :Timestamp
<Click to see difference>
Clicking on the see difference shows that there is a ZWNBSP at the beginning of the Actual text.
Copypasting my text to the online tool for displaying non-printable unicode characters https://www.soscisurvey.de/tools/view-chars.php shows only CR LF at the ends of the lines, no ZWNBSPs.
But where does it come from?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
那是Unicode零宽度无突出特征。当在Unicode编码的文本文件的开头使用时,它用作“字节订单标记”。您可以读取它以确定文本文件的编码,然后,如果需要,可以安全地丢弃它。您能做的最好的事情是传播意识。
That is the Unicode zero-width no-break space character. When used at the beginning of Unicode encoded text files, it serves as a 'byte-order-mark' . You read it to determine the encoding of the text file, then you can safely discard it if you want. The best thing you can do is spread awareness.
在Intellij Idea中,您可以在文本编辑器中删除BOM(右下角)
示例
In Intellij IDEA you can remove BOM in text editor (down right corner)
example
这是一个bom的角色。您可以自己删除它或使用其他几种解决方案(请参阅 https://stackoverflow.com/a/a/48979993/1420794 实例)
It's a BOM character. You may remove it yourself or use several other solutions (see https://stackoverflow.com/a/4897993/1420794 for instance)