正则表达式分割线(csv文件)

发布于 2024-09-10 09:34:12 字数 346 浏览 5 评论 0原文

我不擅长正则表达式。有人可以帮我写正则表达式吗?

我在读取 csv 文件时可能有这样的值。

"Artist,Name",Album,12-SCS
"val""u,e1",value2,value3

输出:

Artist,Name  
Album
12-SCS
Val"u,e1 
Value2 
Value3

更新: 我喜欢使用 Oledb 提供程序的想法。我们在网页上确实有文件上传控制,我使用流读取器读取文件的内容,而无需在文件系统上实际保存文件。有什么方法可以使用 Oledb 提供程序,因为我们需要在连接字符串中指定文件名,而在我的情况下,我没有在文件系统上保存文件。

I am not good in regex. Can some one help me out to write regex for me?

I may have values like this while reading csv file.

"Artist,Name",Album,12-SCS
"val""u,e1",value2,value3

Output:

Artist,Name  
Album
12-SCS
Val"u,e1 
Value2 
Value3

Update:
I like idea using Oledb provider. We do have file upload control on the web page, that I read the content of the file using stream reader without actual saving file on the file system. Is there any way I can user Oledb provider because we need to specify the file name in connection string and in my case i don't have file saved on file system.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

み零 2024-09-17 09:34:12

只是添加我今天早上研究的解决方案。

var regex = new Regex("(?<=^|,)(\"(?:[^\"]|\"\")*\"|[^,]*)");

foreach (Match m in regex.Matches("<-- input line -->"))
{
    var s = m.Value; 
}

如您所见,您需要每行调用 regex.Matches() 。然后,它将返回一个 MatchCollection,其项目数与列的项目数相同。显然,每个匹配的 Value 属性是解析后的值。

这仍然是一项正在进行的工作,但它可以愉快地解析 CSV 字符串,如下所示:

2,3.03,"Hello, my name is ""Joshua""",A,B,C,,,D

Just adding the solution I worked on this morning.

var regex = new Regex("(?<=^|,)(\"(?:[^\"]|\"\")*\"|[^,]*)");

foreach (Match m in regex.Matches("<-- input line -->"))
{
    var s = m.Value; 
}

As you can see, you need to call regex.Matches() per line. It will then return a MatchCollection with the same number of items you have as columns. The Value property of each match is, obviously, the parsed value.

This is still a work in progress, but it happily parses CSV strings like:

2,3.03,"Hello, my name is ""Joshua""",A,B,C,,,D
地狱即天堂 2024-09-17 09:34:12

实际上,用正则表达式匹配 CVS 行非常容易。试试这个:

StringCollection resultList = new StringCollection();
try {
    Regex pattern = new Regex(@"
        # Parse CVS line. Capture next value in named group: 'val'
        \s*                      # Ignore leading whitespace.
        (?:                      # Group of value alternatives.
          ""                     # Either a double quoted string,
          (?<val>                # Capture contents between quotes.
            [^""]*(""""[^""]*)*  # Zero or more non-quotes, allowing 
          )                      # doubled "" quotes within string.
          ""\s*                  # Ignore whitespace following quote.
        |  (?<val>[^,]*)         # Or... zero or more non-commas.
        )                        # End value alternatives group.
        (?:,|$)                  # Match end is comma or EOS", 
        RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
    Match matchResult = pattern.Match(subjectString);
    while (matchResult.Success) {
        resultList.Add(matchResult.Groups["val"].Value);
        matchResult = matchResult.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

免责声明:正则表达式已经在 RegexBuddy,(生成此代码片段),它正确匹配OP测试数据,但C#代码逻辑未经测试。 (我无法访问 C# 工具。)

Actually, its pretty easy to match CVS lines with a regex. Try this one out:

StringCollection resultList = new StringCollection();
try {
    Regex pattern = new Regex(@"
        # Parse CVS line. Capture next value in named group: 'val'
        \s*                      # Ignore leading whitespace.
        (?:                      # Group of value alternatives.
          ""                     # Either a double quoted string,
          (?<val>                # Capture contents between quotes.
            [^""]*(""""[^""]*)*  # Zero or more non-quotes, allowing 
          )                      # doubled "" quotes within string.
          ""\s*                  # Ignore whitespace following quote.
        |  (?<val>[^,]*)         # Or... zero or more non-commas.
        )                        # End value alternatives group.
        (?:,|$)                  # Match end is comma or EOS", 
        RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
    Match matchResult = pattern.Match(subjectString);
    while (matchResult.Success) {
        resultList.Add(matchResult.Groups["val"].Value);
        matchResult = matchResult.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Disclaimer: The regex has been tested in RegexBuddy, (which generated this snippet), and it correctly matches the OP test data, but the C# code logic is untested. (I don't have access to C# tools.)

烛影斜 2024-09-17 09:34:12

正则表达式不是适合此目的的工具。使用 CSV 解析器内置之一或第三方之一。

Regex is not the suitable tool for this. Use a CSV parser. Either the builtin one or a 3rd party one.

半暖夏伤 2024-09-17 09:34:12

查看 TextFieldParser 类。它位于 Microsoft.VisualBasic 程序集中,并执行定界和固定宽度解析。

Give the TextFieldParser class a look. It's in the Microsoft.VisualBasic assembly and does delimited and fixed width parsing.

旧伤还要旧人安 2024-09-17 09:34:12

尝试一下CsvHelper(我维护的一个库)。它可以通过 NuGet 获得。

您可以轻松地将 CSV 文件读入自定义类集合中。它也非常快。

var streamReader = // Create a StreamReader to your CSV file
var csvReader = new CsvReader( streamReader );
var myObjects = csvReader.GetRecords<MyObject>();

Give CsvHelper a try (a library I maintain). It's available via NuGet.

You can easily read a CSV file into a custom class collection. It's also very fast.

var streamReader = // Create a StreamReader to your CSV file
var csvReader = new CsvReader( streamReader );
var myObjects = csvReader.GetRecords<MyObject>();
久随 2024-09-17 09:34:12

正则表达式在这里可能会变得过于复杂。用逗号分割该行,然后迭代结果位并将它们连接到“连接字符串中双引号的数量”不均匀的位置。

“你好,这个”,是,“一个”“测试”“”

...分割...

“你好|这个”|是 | "a ""test"""

...迭代并合并'直到你有偶数个双引号...

"hello,this" - 偶数个引号(注意在位之间插入的分割删除了逗号)

是 -偶数个引号

"a ""test""" - 偶数个引号

...然后删除前导和尾随引号(如果存在)并将 "" 替换为 "。

Regex might get overly complex here. Split the line on commas, and then iterate over the resultant bits and concatenate them where "the number of double quotes in the concatenated string" is not even.

"hello,this",is,"a ""test"""

...split...

"hello | this" | is | "a ""test"""

...iterate and merge 'til you've an even number of double quotes...

"hello,this" - even number of quotes (note comma removed by split inserted between bits)

is - even number of quotes

"a ""test""" - even number of quotes

...then strip of leading and trailing quote if present and replace "" with ".

旧时浪漫 2024-09-17 09:34:12

可以使用下面的代码来完成:

using Microsoft.VisualBasic.FileIO;
string csv = "1,2,3,"4,3","a,"b",c",end";
TextFieldParser parser = new TextFieldParser(new StringReader(csv));
//To read from file
//TextFieldParser parser = new TextFieldParser("csvfile.csv");
parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
string[] fields =null;
while (!parser.EndOfData)
{
    fields = parser.ReadFields();
}
parser.Close();

It could be done using below code:

using Microsoft.VisualBasic.FileIO;
string csv = "1,2,3,"4,3","a,"b",c",end";
TextFieldParser parser = new TextFieldParser(new StringReader(csv));
//To read from file
//TextFieldParser parser = new TextFieldParser("csvfile.csv");
parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
string[] fields =null;
while (!parser.EndOfData)
{
    fields = parser.ReadFields();
}
parser.Close();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文