在java中解析CSV

发布于 2024-09-27 03:47:07 字数 282 浏览 2 评论 0原文

我遇到了这种奇怪的情况,我必须水平阅读。所以我得到一个 csv 文件,其中包含水平格式的数据。如下所示:

CompanyName,RunDate,10/27/2010,11/12/2010,11/27/2010,12/13/2010,12/27/2010....

RunDate 之后显示的所有日期都是运行日期字段的值,我必须在系统中更新该公司的该字段。日期值不是固定数字,可以是单个值到 10 到 n 个数字。所以我需要读取所有这些值并在系统中更新。我正在用 Java 写这个。

I have this weird situation where I have to read horizontally. So I am getting a csv file which has data in horizontal format. Like below:

CompanyName,RunDate,10/27/2010,11/12/2010,11/27/2010,12/13/2010,12/27/2010....

All the dates shown after RunDate are values for run date field and I have to update that field for that company in my system. The date values are not fix number, they can be single value to 10 to n number. So I need to read all those values and update in the system. I am writing this in Java.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

掌心的温暖 2024-10-04 03:47:07

String,split(",") 不太可能起作用。
它会分割嵌入逗号(“Foo, Inc.”)的字段,即使它们是 CSV 行中的单个字段。

如果公司名称是:
       公司
或者更糟:
       乔的“又好又快又便宜”的食物

根据维基百科:    (http://en.wikipedia.org/维基/逗号分隔值

嵌入逗号的字段必须用双引号字符括起来。

 1997,福特,E350,“超级豪华卡车”

嵌入双引号字符的字段必须用双引号字符括起来,并且每个嵌入的双引号字符必须由一对双引号字符表示。

 1997,福特,E350,“超级“豪华”卡车”

更糟糕的是,引用的字段可能有嵌入的换行符(换行符;“\n”):

嵌入换行符的字段必须用双引号字符括起来。

 1997,福特,E350,“现在就去买一辆  
   他们走得很快”

这演示了 String,split(",") 解析逗号的问题:

CSV 行是:

a,b,c,“Company, Inc.”, d, e,“Joe 的“又好又快又便宜”的食物”, f, 10/11/2010,1/1/2011, g ,h,我

// Test String.split(",") against CSV with
// embedded commas and embedded double-quotes in
// quoted text strings:
//
// Company names are:
//        Company, Inc.
//        Joe's "Good, Fast, and Cheap" Food
//
// Which should be formatted in a CSV file as:
//        "Company, Inc."
//        "Joe's ""Good, Fast, and Cheap"" Food"
//
//
public class TestSplit {
    public static void TestSplit(String s, String splitchar) {
        String[] split_s    = s.split(splitchar);

        for (String seg : split_s) {
            System.out.println(seg);
        }
    }


    public static void main(String[] args) {
        String csvLine = "a,b,c,\"Company, Inc.\", d,"
                            + " e,\"Joe's \"\"Good, Fast,"
                            + " and Cheap\"\" Food\", f,"
                            + " 10/11/2010,1/1/2011, h, i";

        System.out.println("CSV line is:\n" + csvLine + "\n\n");
        TestSplit(csvLine, ",");
    }
}

产生以下内容:


D:\projects\TestSplit>javac TestSplit.java

D:\projects\TestSplit>java  TestSplit
CSV line is:
a,b,c,"Company, Inc.", d, e,"Joe's ""Good, Fast, and Cheap"" Food", f, 10/11/2010,1/1/2011, g, h, i


a
b
c
"Company
 Inc."
 d
 e
"Joe's ""Good
 Fast
 and Cheap"" Food"
 f
 10/11/2010
1/1/2011
 g
 h
 i

D:\projects\TestSplit>

CSV 行应该解析为:


a
b
c
"Company, Inc."
 d
 e
"Joe's ""Good, Fast, and Cheap"" Food"
 f
 10/11/2010
1/1/2011
 g
 h
 i

String,split(",") isn't likely to work.
It will split fields that have embedded commas ("Foo, Inc.") even though they are a single field in the CSV line.

What if the company name is:
        Company, Inc.
or worse:
        Joe's "Good, Fast, and Cheap" Food

According to Wikipedia:    (http://en.wikipedia.org/wiki/Comma-separated_values)

Fields with embedded commas must be enclosed within double-quote characters.

   1997,Ford,E350,"Super, luxurious truck"

Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.

   1997,Ford,E350,"Super ""luxurious"" truck"

Even worse, quoted fields may have embedded line breaks (newlines; "\n"):

Fields with embedded line breaks must be enclosed within double-quote characters.

   1997,Ford,E350,"Go get one now  
   they are going fast"

This demonstrates the problem with String,split(",") parsing commas:

The CSV line is:

a,b,c,"Company, Inc.", d, e,"Joe's ""Good, Fast, and Cheap"" Food", f, 10/11/2010,1/1/2011, g, h, i

// Test String.split(",") against CSV with
// embedded commas and embedded double-quotes in
// quoted text strings:
//
// Company names are:
//        Company, Inc.
//        Joe's "Good, Fast, and Cheap" Food
//
// Which should be formatted in a CSV file as:
//        "Company, Inc."
//        "Joe's ""Good, Fast, and Cheap"" Food"
//
//
public class TestSplit {
    public static void TestSplit(String s, String splitchar) {
        String[] split_s    = s.split(splitchar);

        for (String seg : split_s) {
            System.out.println(seg);
        }
    }


    public static void main(String[] args) {
        String csvLine = "a,b,c,\"Company, Inc.\", d,"
                            + " e,\"Joe's \"\"Good, Fast,"
                            + " and Cheap\"\" Food\", f,"
                            + " 10/11/2010,1/1/2011, h, i";

        System.out.println("CSV line is:\n" + csvLine + "\n\n");
        TestSplit(csvLine, ",");
    }
}

Produces the following:


D:\projects\TestSplit>javac TestSplit.java

D:\projects\TestSplit>java  TestSplit
CSV line is:
a,b,c,"Company, Inc.", d, e,"Joe's ""Good, Fast, and Cheap"" Food", f, 10/11/2010,1/1/2011, g, h, i


a
b
c
"Company
 Inc."
 d
 e
"Joe's ""Good
 Fast
 and Cheap"" Food"
 f
 10/11/2010
1/1/2011
 g
 h
 i

D:\projects\TestSplit>

Where that CSV line should be parsed as:


a
b
c
"Company, Inc."
 d
 e
"Joe's ""Good, Fast, and Cheap"" Food"
 f
 10/11/2010
1/1/2011
 g
 h
 i
原谅我要高飞 2024-10-04 03:47:07

正如其他人建议的分割和解析,您可以使用 opencsv

对于简单数据,用“,”分割它们解析它并使用 List 添加所有这些值。

As other has suggested for splitting and parsing you can use opencsv

For simple data, split them by "," and parse it and ,Use List to add all these values.

染火枫林 2024-10-04 03:47:07

CSV 文件是一个以 \n 结尾的文件,每列可以通过以下方式分隔:

  • 逗号或
  • 制表符 \t

我建议您有一个 BufferedReader 读取 CSV 文件并使用 readLine() 方法读取行。

在每一行中,使用 String.split(arg) ,其中 arg 将是逗号或制表符 \t 以获得列数组。 ..从那里,你知道该怎么做。

A CSV file is a \n terminated file that each column can be seperated either by:

  • Comma or
  • Tabs \t

I suggest that you have a BufferedReader that reads the CSV file and use the readLine() method to read the row.

From each row, use String.split(arg) where arg will be your comma or tab \t to have an array of columns....from there, you know what to do.

岁月静好 2024-10-04 03:47:07

到目前为止,我发现的关于 CSV 解析主题的最有用的页面如下:

http://secretgeek。 net/csv_trouble.asp

基本上,找一个已建立的库来为您做这件事,因为 csv 解析非常棘手。

By far the most useful page on the subject of CSV parsing I've ever found is the following:

http://secretgeek.net/csv_trouble.asp

Basically, get an established library to do it for you, because csv parsing is deceptively tricky.

ぇ气 2024-10-04 03:47:07

使用 java.util.Scanner - 您可以调用 useDelimiter() 使逗号成为分隔符,并使用 next() 读取新标记。扫描程序可以直接从您的文件或从文件中读取的字符串创建。

use java.util.Scanner - you can call useDelimiter() to make the comma your delimiter, and read new tokens with next(). The Scanner can be created directly from your file or a string read from the file.

忆梦 2024-10-04 03:47:07

你真的应该尝试一下
univocity-parsers 因为它的 CSV 解析器具有许多功能来处理各种极端情况(未转义引号、混合行分隔符、BOM 编码文件等),这也是最快的之一CSV 库 周围。

解析文件的简单示例:

CsvParserSettings settings = new CsvParserSettings(); //heaps of options here, check the docs
CsvParser parser = new CsvParser(settings);

//loads everything into memory, simple but can be slow.
List<String[]> allRows = parser.parseAll(new File("/path/to/your.csv"));

//parse iterating over each row
for(String[] row : parser.iterate(new File("/path/to/your.csv"))){
    //process row here
}

//and many other possibilities: Java bean processing, column selection, format detection, etc.

披露:我是这个库的作者。它是开源且免费的(Apache V2.0 许可证)。

You should really try
univocity-parsers as its CSV parser comes with many features to handle all sorts of corner cases (unescaped quotes, mixed line delimiters, BOM encoded files, etc), which is also one of the fastest CSV libraries around.

Simple example to parse a file:

CsvParserSettings settings = new CsvParserSettings(); //heaps of options here, check the docs
CsvParser parser = new CsvParser(settings);

//loads everything into memory, simple but can be slow.
List<String[]> allRows = parser.parseAll(new File("/path/to/your.csv"));

//parse iterating over each row
for(String[] row : parser.iterate(new File("/path/to/your.csv"))){
    //process row here
}

//and many other possibilities: Java bean processing, column selection, format detection, etc.

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

蝶…霜飞 2024-10-04 03:47:07

首先将整行读入字符串中。然后,您使用 String.split(...) 函数来获取您使用的分隔符为“,”的行上的所有标记。 (或者当你使用正则表达式时它是“\,”?)

You start by reading the entire line into a String. Then you use the String.split(...) function to get all the tokens on the line where the delimiter you use is ",". (or is it "\," when you use a regex?)

要走干脆点 2024-10-04 03:47:07

为了一次获取每个值,使用StringTokenizer.使用 StringTokenizer(str, ",") 构造它。(不推荐)

使用 string 类的 split() 方法,该方法将所有标记加载到数组中。

使用 DateFormat 类来解析每个日期——特别是DateFormat.parse(String)

In order to get each value one at a time, use a StringTokenizer. Construct it with StringTokenizer(str, ","). (Not recommended)

Use the split() method of the string class, which loads all of the tokens into an array.

Use the DateFormat class to parse each date -- specifically DateFormat.parse(String).

眼眸里的快感 2024-10-04 03:47:07

java.time

假设您正在使用 CSV 库来读取文件,并假设您从该库中获取字符串形式的各个值:

    String valueFromCsvLibrary = "10/27/2010";
    try {
        LocalDate date = LocalDate.parse(valueFromCsvLibrary, dateFormatter);
        System.out.println("Parsed date: " + date);
    } catch (DateTimeParseException dtpe) {
        System.err.println("Not a valid date: " + dtpe);
    }
解析日期:2010-10-27

您应该更愿意在代码中将日期处理为 LocalDate(既不是作为字符串,也不是作为过时且设计不良的 Date 类的实例)。

尽管我没有经验,但我非常确信我会使用一些开源 CSV 库。

仅当您确定 CSV 文件的值中不包含引号、断线、逗号或其他复杂情况,并且由于某种原因您选择手动解析它时:

    String lineFromCsvFile = "CompanyName,RunDate,10/27/2010,11/12/2010,11/27/2010,12/13/2010,12/27/2010";
    String[] values = lineFromCsvFile.split(",");
    if (values[1].equals("RunDate")) {
        for (int i = 2; i < values.length; i++) {
            LocalDate date = LocalDate.parse(values[i], dateFormatter);
            System.out.println("Parsed date: " + date);
        }
    }
解析日期:2010-10-27
解析日期:2010-11-12
解析日期:2010-11-27
解析日期:2010-12-13
解析日期:2010-12-27

异常处理和以前一样发生,无需重复。

java.time

Assuming you are using a CSV library for reading the file and supposing that you get the individual values as strings from that library:

    String valueFromCsvLibrary = "10/27/2010";
    try {
        LocalDate date = LocalDate.parse(valueFromCsvLibrary, dateFormatter);
        System.out.println("Parsed date: " + date);
    } catch (DateTimeParseException dtpe) {
        System.err.println("Not a valid date: " + dtpe);
    }
Parsed date: 2010-10-27

You should prefer to process the dates as LocalDate in your code (neither as strings nor as instances of the long outdated and poorly designed Date class).

Even though I don’t have the experience, I am quite convinced that I would go with some open source CSV library.

Only in case you are sure that the CSV file doesn’t contain quotes, broken lines, commas in the values or other complications and for some reason you choose to parse it by hand:

    String lineFromCsvFile = "CompanyName,RunDate,10/27/2010,11/12/2010,11/27/2010,12/13/2010,12/27/2010";
    String[] values = lineFromCsvFile.split(",");
    if (values[1].equals("RunDate")) {
        for (int i = 2; i < values.length; i++) {
            LocalDate date = LocalDate.parse(values[i], dateFormatter);
            System.out.println("Parsed date: " + date);
        }
    }
Parsed date: 2010-10-27
Parsed date: 2010-11-12
Parsed date: 2010-11-27
Parsed date: 2010-12-13
Parsed date: 2010-12-27

Exception handling happens as before, no need to repeat that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文