使用分隔符分割,除非分隔符被转义

发布于 2024-10-07 01:46:43 字数 1190 浏览 4 评论 0原文

读取来自 excel 的剪贴板数据

我正在使用var stream = (System.IO.Stream) ( Forms.Clipboard.GetDataObject() ).GetData( Forms.DataFormats.CommaSeparatedValue );

,但不幸的是, excel正在传递单元格文本而不是单元格值。当单元格使用特殊格式(例如千位分隔符)时,列中一系列单元格的剪贴板数据如下所示:

 1,234,123.00    2,345.00    342.00      12,345.00

存储如下:

\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"

当我真正想要的是这样时:

 1234123.00, 2345.00, 342.00, 12345.00

我之前一直使用 < code>clipData.Split(new string[] { "," }, StringSllitOptions.None)) 函数将我的 CSV 剪贴板数据转换为一系列单元格,但当存在包含逗号的转义格式文本时,此操作会失败。


我问是否有人能想出一种方法将该字符串拆分为一组单元格,忽略 \" 位内转义的逗号,因为这就是 Excel 选择转义包含逗号的单元格的方式 简而言之,

如何将包含以下内容的单个字符串:

\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"

转换为包含以下内容的字符串数组:

{ "1,234,123.00", "2,345.00", "342.00", "12,345.00" }

而不破坏我解析简单逗号分隔字符串的能力

*****edit***

后续问题(表述为DFA)此处:根据每次确定性有限自动机达到最终状态来分割字符串?

I'm reading clipboard data coming from excel using

var stream = (System.IO.Stream) ( Forms.Clipboard.GetDataObject() ).GetData( Forms.DataFormats.CommaSeparatedValue );,

but unfortunately, excel is passing cell text instead of cell values. When the cells are using special formatting (such as the thousands seperator), the clipboard data for a series of cells in columns that looks like this:

 1,234,123.00    2,345.00    342.00      12,345.00

is stored as this:

\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"

when what I really want is this:

 1234123.00, 2345.00, 342.00, 12345.00

I had been previously using the clipData.Split(new string[] { "," }, StringSllitOptions.None)) function to turn my CSV clipboard data into a series of cells, but this fails when there is escaped formatted text containing commas.


I'm asking if anyone can think of a way to split this string into a set of cells, ignoring the commas escaped within the \" bits, since this is how Excel is choosing to escape cells containing commas.

In short, how can I turn a single string containing this:

\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"

into an array of strings containing this:

{ "1,234,123.00", "2,345.00", "342.00", "12,345.00" }

Without ruining my ability to parse a simple comma delimited string.

*****edit***

Follow up question (formulated as a DFA) here: Split a string based on each time a Deterministic Finite Automata reaches a final state?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

独守阴晴ぅ圆缺 2024-10-14 01:46:43

首先,我之前处理过 Excel 中的数据,您通常看到的是逗号分隔的值,如果该值被视为字符串,它将带有双引号(并且可以包含逗号和双引号)。如果它被认为是数字,那么就没有双引号。此外,如果数据包含双引号,则将由双引号分隔,例如 ""。因此,假设所有这些都是我过去处理此问题的方式

public static IEnumerable<string> SplitExcelRow(this string value)
{
    value = value.Replace("\"\"", """);
    bool quoted = false;
    int currStartIndex = 0;
    for (int i = 0; i < value.Length; i++)
    {
        char currChar = value[i];
        if (currChar == '"')
        {
            quoted = !quoted;       
        }
        else if (currChar == ',')
        {
            if (!quoted)
            {
                yield return value.Substring(currStartIndex, i - currStartIndex)
                    .Trim()
                    .Replace("\"","")
                    .Replace(""","\"");
                currStartIndex = i + 1;
            }
        }
    }
    yield return value.Substring(currStartIndex, value.Length - currStartIndex)
        .Trim()
        .Replace("\"", "")
        .Replace(""", "\"");
}

当然,这假设传入的数据是有效的,因此如果您有类似 "fo,o"b,ar","bar""foo " 这不起作用。此外,如果您的数据包含 " 那么它将变成 " ,这可能是也可能不是所需的。

First off I've dealt with data from Excel before and what you typically see is comma separated values and if the value is considered to be a string it will have double quotes around it (and can contain commas and double quotes). If it is considered to be numeric then there are not double quotes. Additionally if the data contains a double quote that will be delimited by a double quote like "". So assuming all of that here's how I've dealt with this in the past

public static IEnumerable<string> SplitExcelRow(this string value)
{
    value = value.Replace("\"\"", """);
    bool quoted = false;
    int currStartIndex = 0;
    for (int i = 0; i < value.Length; i++)
    {
        char currChar = value[i];
        if (currChar == '"')
        {
            quoted = !quoted;       
        }
        else if (currChar == ',')
        {
            if (!quoted)
            {
                yield return value.Substring(currStartIndex, i - currStartIndex)
                    .Trim()
                    .Replace("\"","")
                    .Replace(""","\"");
                currStartIndex = i + 1;
            }
        }
    }
    yield return value.Substring(currStartIndex, value.Length - currStartIndex)
        .Trim()
        .Replace("\"", "")
        .Replace(""", "\"");
}

Of course this assumes the data coming in is valid so if you have something like "fo,o"b,ar","bar""foo" this will not work. Additionally if your data contains " then it will be turned into a " which may or may not be desirable.

红颜悴 2024-10-14 01:46:43

有很多方法可以做到这一点。一种不优雅的工作方式是:

  1. 将 \",\" 转换为制表符或其他分隔符(我假设您在示例中遗漏了一些 \",因为否则字符串不一致
  2. 剥离所有剩余的逗号
  3. 剥离所有剩余的 \"
  4. 转换将分隔符(例如制表符)放回逗号

现在您首先拥有了想要的内容

There are a lot of ways to do this. One inelegant way that would work is:

  1. Convert \",\" to tab or some other delimiter (I assume you left out a few \" in your example because otherwise the string is not consistent
  2. Strip all remaining commas
  3. Strip all remaining \"
  4. Convert your delimiter (e.g. tab) back into a comma

Now you have what you wanted in first place

智商已欠费 2024-10-14 01:46:43

我同意凯尔关于你的字符串可能不一致的观点。

您可以使用而不是凯尔的第一步

string[] vals = Regex.Split(value, @"\s*\"",\s*");

I agree with Kyle regarding your string probably not being consistent.

Instead of Kyle's first step you could use

string[] vals = Regex.Split(value, @"\s*\"",\s*");
江城子 2024-10-14 01:46:43

从您的输入示例中,我们可以看到存在三个“不需要的”字符序列:

\"
\",
,\"

因此,将所有这些序列添加到 Split 方法的输入数组中:

string[] result = clipData.Split(new[] { @",\""", @"\"",", @"\""" }, 
    StringSplitOptions.None);

这将为您提供一个包含一些字符的数组空元素。如果这是一个问题,请使用 StringSplitOptions.RemoveEmptyEntries 而不是 StringSplitOptions.None

string[] result = clipData.Split(new[] { @",\""", @"\"",", @"\""" }, 
    StringSplitOptions.RemoveEmptyEntries);

From your input example, we can see that there are three "unwanted" sequences of characters:

\"
\",
,\"

So, add all these sequences to the input array for the Split method:

string[] result = clipData.Split(new[] { @",\""", @"\"",", @"\""" }, 
    StringSplitOptions.None);

This will give you an array containing a few empty elements. If that is a problem, use StringSplitOptions.RemoveEmptyEntries instead of StringSplitOptions.None:

string[] result = clipData.Split(new[] { @",\""", @"\"",", @"\""" }, 
    StringSplitOptions.RemoveEmptyEntries);
天煞孤星 2024-10-14 01:46:43

您可以尝试使用一些 LINQ:

string excelData = "\\\" 1,234,123.00 \\\",\\\" 2,345.00 \\\", 342.00 ,\\\" 12,345.00 \\\"";

IEnumerable<string> cells = from x in excelData.Split(new string[] { "\\\"" }, StringSplitOptions.RemoveEmptyEntries)
                            let y = x.Trim(',').Trim()
                            where !string.IsNullOrWhiteSpace(y)
                            select y;

或者,如果您不喜欢此建议,请尝试使用 RegEx 实现类似的模式。

You could try to use a bit of LINQ:

string excelData = "\\\" 1,234,123.00 \\\",\\\" 2,345.00 \\\", 342.00 ,\\\" 12,345.00 \\\"";

IEnumerable<string> cells = from x in excelData.Split(new string[] { "\\\"" }, StringSplitOptions.RemoveEmptyEntries)
                            let y = x.Trim(',').Trim()
                            where !string.IsNullOrWhiteSpace(y)
                            select y;

Alternatively, if you don't like this suggestion, try to implement a similar pattern with RegEx.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文