正则表达式分割线(csv文件)
我不擅长正则表达式。有人可以帮我写正则表达式吗?
我在读取 csv 文件时可能有这样的值。
"Artist,Name",Album,12-SCS "val""u,e1",value2,value3
输出:
Artist,Name Album 12-SCS Val"u,e1 Value2 Value3
更新: 我喜欢使用 Oledb 提供程序的想法。我们在网页上确实有文件上传控制,我使用流读取器读取文件的内容,而无需在文件系统上实际保存文件。有什么方法可以使用 Oledb 提供程序,因为我们需要在连接字符串中指定文件名,而在我的情况下,我没有在文件系统上保存文件。
I am not good in regex. Can some one help me out to write regex for me?
I may have values like this while reading csv file.
"Artist,Name",Album,12-SCS "val""u,e1",value2,value3
Output:
Artist,Name Album 12-SCS Val"u,e1 Value2 Value3
Update:
I like idea using Oledb provider. We do have file upload control on the web page, that I read the content of the file using stream reader without actual saving file on the file system. Is there any way I can user Oledb provider because we need to specify the file name in connection string and in my case i don't have file saved on file system.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
只是添加我今天早上研究的解决方案。
如您所见,您需要每行调用 regex.Matches() 。然后,它将返回一个 MatchCollection,其项目数与列的项目数相同。显然,每个匹配的 Value 属性是解析后的值。
这仍然是一项正在进行的工作,但它可以愉快地解析 CSV 字符串,如下所示:
Just adding the solution I worked on this morning.
As you can see, you need to call regex.Matches() per line. It will then return a MatchCollection with the same number of items you have as columns. The Value property of each match is, obviously, the parsed value.
This is still a work in progress, but it happily parses CSV strings like:
实际上,用正则表达式匹配 CVS 行非常容易。试试这个:
免责声明:正则表达式已经在 RegexBuddy,(生成此代码片段),它正确匹配OP测试数据,但C#代码逻辑未经测试。 (我无法访问 C# 工具。)
Actually, its pretty easy to match CVS lines with a regex. Try this one out:
Disclaimer: The regex has been tested in RegexBuddy, (which generated this snippet), and it correctly matches the OP test data, but the C# code logic is untested. (I don't have access to C# tools.)
正则表达式不是适合此目的的工具。使用 CSV 解析器。 内置之一或第三方之一。
Regex is not the suitable tool for this. Use a CSV parser. Either the builtin one or a 3rd party one.
查看 TextFieldParser 类。它位于 Microsoft.VisualBasic 程序集中,并执行定界和固定宽度解析。
Give the TextFieldParser class a look. It's in the Microsoft.VisualBasic assembly and does delimited and fixed width parsing.
尝试一下CsvHelper(我维护的一个库)。它可以通过 NuGet 获得。
您可以轻松地将 CSV 文件读入自定义类集合中。它也非常快。
Give CsvHelper a try (a library I maintain). It's available via NuGet.
You can easily read a CSV file into a custom class collection. It's also very fast.
正则表达式在这里可能会变得过于复杂。用逗号分割该行,然后迭代结果位并将它们连接到“连接字符串中双引号的数量”不均匀的位置。
“你好,这个”,是,“一个”“测试”“”
...分割...
“你好|这个”|是 | "a ""test"""
...迭代并合并'直到你有偶数个双引号...
"hello,this" - 偶数个引号(注意在位之间插入的分割删除了逗号)
是 -偶数个引号
"a ""test""" - 偶数个引号
...然后删除前导和尾随引号(如果存在)并将 "" 替换为 "。
Regex might get overly complex here. Split the line on commas, and then iterate over the resultant bits and concatenate them where "the number of double quotes in the concatenated string" is not even.
"hello,this",is,"a ""test"""
...split...
"hello | this" | is | "a ""test"""
...iterate and merge 'til you've an even number of double quotes...
"hello,this" - even number of quotes (note comma removed by split inserted between bits)
is - even number of quotes
"a ""test""" - even number of quotes
...then strip of leading and trailing quote if present and replace "" with ".
可以使用下面的代码来完成:
It could be done using below code: