解析CSV时出现ZWNBSP

发布于 2025-01-26 02:45:40 字数 1626 浏览 1 评论 0原文

我有一个CSV,我想检查它是否具有应有的所有数据。但是看起来ZWNBSP出现在第一个字符串中的第一列名的开头。

我简化的代码是

@Test
void parseCsvTest() throws Exception {
    Configuration.holdBrowserOpen = true;
    ClassLoader classLoader = getClass().getClassLoader();
    try (
            InputStream inputStream = classLoader.getResourceAsStream("files/csv_example.csv");
            CSVReader reader = new CSVReader(new InputStreamReader(inputStream))
    ) {
        List<String[]> content = reader.readAll();
        var csvStrings0line = content.get(0);
        var csv1stElement = csvStrings0line[0];
        var csv1stElementShouldBe = "Timestamp";
        assertEquals(csv1stElementShouldBe,csv1stElement);

我的CSV包含

"Timestamp","Source","EventName","CountryId","Platform","AppVersion","DeviceType","OsVersion"
"2022-05-02T14:56:59.536987Z","courierapp","order_delivered_sent","643","ios","3.11.0","iPhone 11","15.4.1"
"2022-05-02T14:57:35.849328Z","courierapp","order_delivered_sent","643","ios","3.11.0","iPhone 8","15.3.1"

我的测试失败,

expected: <Timestamp> but was: <Timestamp>
Expected :Timestamp
Actual   :Timestamp
<Click to see difference>

单击“查看”差异显示,即实际文本的开头有一个zwnbsp。

复制我的文本到在线工具中显示非打印Unicode字符 https://www.soscisurvey.de/tools/view-chars.php 仅在线的末端显示Cr LF,没有ZWNBSPS。

但是它来自哪里?

I have a CSV and I want to check if it has all the data it should have. But it looks like ZWNBSP appears at the beginning of the 1st column name in the 1st string.

My simplified code is

@Test
void parseCsvTest() throws Exception {
    Configuration.holdBrowserOpen = true;
    ClassLoader classLoader = getClass().getClassLoader();
    try (
            InputStream inputStream = classLoader.getResourceAsStream("files/csv_example.csv");
            CSVReader reader = new CSVReader(new InputStreamReader(inputStream))
    ) {
        List<String[]> content = reader.readAll();
        var csvStrings0line = content.get(0);
        var csv1stElement = csvStrings0line[0];
        var csv1stElementShouldBe = "Timestamp";
        assertEquals(csv1stElementShouldBe,csv1stElement);

My CSV contains

"Timestamp","Source","EventName","CountryId","Platform","AppVersion","DeviceType","OsVersion"
"2022-05-02T14:56:59.536987Z","courierapp","order_delivered_sent","643","ios","3.11.0","iPhone 11","15.4.1"
"2022-05-02T14:57:35.849328Z","courierapp","order_delivered_sent","643","ios","3.11.0","iPhone 8","15.3.1"

My test fails with

expected: <Timestamp> but was: <Timestamp>
Expected :Timestamp
Actual   :Timestamp
<Click to see difference>

Clicking on the see difference shows that there is a ZWNBSP at the beginning of the Actual text.

enter image description here

Copypasting my text to the online tool for displaying non-printable unicode characters https://www.soscisurvey.de/tools/view-chars.php shows only CR LF at the ends of the lines, no ZWNBSPs.

But where does it come from?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

没有伤那来痛 2025-02-02 02:45:41

那是Unicode零宽度无突出特征。当在Unicode编码的文本文件的开头使用时,它用作“字节订单标记”。您可以读取它以确定文本文件的编码,然后,如果需要,可以安全地丢弃它。您能做的最好的事情是传播意识。

That is the Unicode zero-width no-break space character. When used at the beginning of Unicode encoded text files, it serves as a 'byte-order-mark' . You read it to determine the encoding of the text file, then you can safely discard it if you want. The best thing you can do is spread awareness.

樱花坊 2025-02-02 02:45:41

在Intellij Idea中,您可以在文本编辑器中删除BOM(右下角)
示例

In Intellij IDEA you can remove BOM in text editor (down right corner)
example

就像说晚安 2025-02-02 02:45:40

这是一个bom的角色。您可以自己删除它或使用其他几种解决方案(请参阅 https://stackoverflow.com/a/a/48979993/1420794 实例)

It's a BOM character. You may remove it yourself or use several other solutions (see https://stackoverflow.com/a/4897993/1420794 for instance)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文