使用分隔符分割,除非分隔符被转义
读取来自 excel 的剪贴板数据
我正在使用var stream = (System.IO.Stream) ( Forms.Clipboard.GetDataObject() ).GetData( Forms.DataFormats.CommaSeparatedValue );
,但不幸的是, excel正在传递单元格文本而不是单元格值。当单元格使用特殊格式(例如千位分隔符)时,列中一系列单元格的剪贴板数据如下所示:
1,234,123.00 2,345.00 342.00 12,345.00
存储如下:
\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"
当我真正想要的是这样时:
1234123.00, 2345.00, 342.00, 12345.00
我之前一直使用 < code>clipData.Split(new string[] { "," }, StringSllitOptions.None)) 函数将我的 CSV 剪贴板数据转换为一系列单元格,但当存在包含逗号的转义格式文本时,此操作会失败。
我问是否有人能想出一种方法将该字符串拆分为一组单元格,忽略 \"
位内转义的逗号,因为这就是 Excel 选择转义包含逗号的单元格的方式 简而言之,
如何将包含以下内容的单个字符串:
\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"
转换为包含以下内容的字符串数组:
{ "1,234,123.00", "2,345.00", "342.00", "12,345.00" }
而不破坏我解析简单逗号分隔字符串的能力
*****edit***
后续问题(表述为DFA)此处:根据每次确定性有限自动机达到最终状态来分割字符串?
I'm reading clipboard data coming from excel using
var stream = (System.IO.Stream) ( Forms.Clipboard.GetDataObject() ).GetData( Forms.DataFormats.CommaSeparatedValue );
,
but unfortunately, excel is passing cell text instead of cell values. When the cells are using special formatting (such as the thousands seperator), the clipboard data for a series of cells in columns that looks like this:
1,234,123.00 2,345.00 342.00 12,345.00
is stored as this:
\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"
when what I really want is this:
1234123.00, 2345.00, 342.00, 12345.00
I had been previously using the clipData.Split(new string[] { "," }, StringSllitOptions.None))
function to turn my CSV clipboard data into a series of cells, but this fails when there is escaped formatted text containing commas.
I'm asking if anyone can think of a way to split this string into a set of cells, ignoring the commas escaped within the \"
bits, since this is how Excel is choosing to escape cells containing commas.
In short, how can I turn a single string containing this:
\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"
into an array of strings containing this:
{ "1,234,123.00", "2,345.00", "342.00", "12,345.00" }
Without ruining my ability to parse a simple comma delimited string.
*****edit***
Follow up question (formulated as a DFA) here: Split a string based on each time a Deterministic Finite Automata reaches a final state?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
首先,我之前处理过 Excel 中的数据,您通常看到的是逗号分隔的值,如果该值被视为字符串,它将带有双引号(并且可以包含逗号和双引号)。如果它被认为是数字,那么就没有双引号。此外,如果数据包含双引号,则将由双引号分隔,例如
""
。因此,假设所有这些都是我过去处理此问题的方式当然,这假设传入的数据是有效的,因此如果您有类似
"fo,o"b,ar","bar""foo "
这不起作用。此外,如果您的数据包含"
那么它将变成 " ,这可能是也可能不是所需的。First off I've dealt with data from Excel before and what you typically see is comma separated values and if the value is considered to be a string it will have double quotes around it (and can contain commas and double quotes). If it is considered to be numeric then there are not double quotes. Additionally if the data contains a double quote that will be delimited by a double quote like
""
. So assuming all of that here's how I've dealt with this in the pastOf course this assumes the data coming in is valid so if you have something like
"fo,o"b,ar","bar""foo"
this will not work. Additionally if your data contains"
then it will be turned into a " which may or may not be desirable.有很多方法可以做到这一点。一种不优雅的工作方式是:
现在您首先拥有了想要的内容
There are a lot of ways to do this. One inelegant way that would work is:
Now you have what you wanted in first place
我同意凯尔关于你的字符串可能不一致的观点。
您可以使用而不是凯尔的第一步
I agree with Kyle regarding your string probably not being consistent.
Instead of Kyle's first step you could use
从您的输入示例中,我们可以看到存在三个“不需要的”字符序列:
因此,将所有这些序列添加到
Split
方法的输入数组中:这将为您提供一个包含一些字符的数组空元素。如果这是一个问题,请使用
StringSplitOptions.RemoveEmptyEntries
而不是StringSplitOptions.None
:From your input example, we can see that there are three "unwanted" sequences of characters:
So, add all these sequences to the input array for the
Split
method:This will give you an array containing a few empty elements. If that is a problem, use
StringSplitOptions.RemoveEmptyEntries
instead ofStringSplitOptions.None
:您可以尝试使用一些 LINQ:
或者,如果您不喜欢此建议,请尝试使用 RegEx 实现类似的模式。
You could try to use a bit of LINQ:
Alternatively, if you don't like this suggestion, try to implement a similar pattern with RegEx.