C# 中的 CSV 验证 - 确保每行具有相同数量的逗号

发布于 2024-10-20 23:15:36 字数 2551 浏览 1 评论 0原文

我希望在我的 C#/ASP.NET 应用程序中实现一个相当简单的 CSV 检查器 - 我的项目自动从 GridView 为用户生成 CSV,但我希望能够快速运行每一行并查看它们是否具有相同数量的逗号,如果出现任何差异则抛出异常。到目前为止,我已经有了这个,它确实有效,但有一些问题我将很快描述:

 int? CommaCount = null;

StringBuilder sb = new StringBuilder();
            StringWriter sw = new StringWriter(sb);
            String Str = null;

            //This loops through all the headerrow cells and writes them to the stringbuilder
            for (int k = 0; k <= (grd.Columns.Count - 1); k++)
            {
                sw.Write(grd.HeaderRow.Cells[k].Text + ",");    
            }

            sw.WriteLine(",");


            //This loops through all the main rows and writes them to the stringbuilder
            for (int i = 0; i <= grd.Rows.Count - 1; i++)
            {
                StringBuilder RowString = new StringBuilder();
                for (int j = 0; j <= grd.Columns.Count - 1; j++)
                {
                    //We'll need to strip meaningless junk such as <br /> and &nbsp;
                    Str = grd.Rows[i].Cells[j].Text.ToString().Replace("<br />", "");
                    if (Str == "&nbsp;")
                    {
                        Str = "";
                    }

                    Str = "\"" + Str + "\"" + ",";

                    RowString.Append(Str);
                    sw.Write(Str);
                }
                sw.WriteLine();

                //The below code block ensures that each row contains the same number of commas, which is crucial
                int RowCommaCount = CheckChar(RowString.ToString(), ',');
                if (CommaCount == null)
                {
                    CommaCount = RowCommaCount;
                }
                else
                {
                    if (CommaCount!= RowCommaCount)
                    {
                        throw new Exception("CSV generated is corrupt - line " + i + " has " + RowCommaCount + " commas when it should have " + CommaCount);
                    }
                }
            }

            sw.Close();

和我的 CheckChar 方法:

protected static int CheckChar(string Input, char CharToCheck)
    {
        int Counter = 0;
        foreach (char StringChar in Input)
        {
            if (StringChar == CharToCheck)
            {
                Counter++;
            }
        }
        return Counter;
    }

现在我的问题是,如果网格中的单元格包含逗号,我的 check char 方法仍然会将它们计为分隔符所以会返回错误。正如您在代码中看到的,我将所有值包装在 " 字符中以“转义”它们。在我的方法中忽略值中的逗号有多简单?我认为我需要重写该方法很多次。

I wish to implement a fairly simple CSV checker in my C#/ASP.NET application - my project automatically generates CSV's from GridView's for users, but I want to be able to quickly run through each line and see if they have the same amount of commas, and throw an exception if any differences occur. So far I have this, which does work but there are some issues I'll describe soon:

 int? CommaCount = null;

StringBuilder sb = new StringBuilder();
            StringWriter sw = new StringWriter(sb);
            String Str = null;

            //This loops through all the headerrow cells and writes them to the stringbuilder
            for (int k = 0; k <= (grd.Columns.Count - 1); k++)
            {
                sw.Write(grd.HeaderRow.Cells[k].Text + ",");    
            }

            sw.WriteLine(",");


            //This loops through all the main rows and writes them to the stringbuilder
            for (int i = 0; i <= grd.Rows.Count - 1; i++)
            {
                StringBuilder RowString = new StringBuilder();
                for (int j = 0; j <= grd.Columns.Count - 1; j++)
                {
                    //We'll need to strip meaningless junk such as <br /> and  
                    Str = grd.Rows[i].Cells[j].Text.ToString().Replace("<br />", "");
                    if (Str == " ")
                    {
                        Str = "";
                    }

                    Str = "\"" + Str + "\"" + ",";

                    RowString.Append(Str);
                    sw.Write(Str);
                }
                sw.WriteLine();

                //The below code block ensures that each row contains the same number of commas, which is crucial
                int RowCommaCount = CheckChar(RowString.ToString(), ',');
                if (CommaCount == null)
                {
                    CommaCount = RowCommaCount;
                }
                else
                {
                    if (CommaCount!= RowCommaCount)
                    {
                        throw new Exception("CSV generated is corrupt - line " + i + " has " + RowCommaCount + " commas when it should have " + CommaCount);
                    }
                }
            }

            sw.Close();

And my CheckChar method:

protected static int CheckChar(string Input, char CharToCheck)
    {
        int Counter = 0;
        foreach (char StringChar in Input)
        {
            if (StringChar == CharToCheck)
            {
                Counter++;
            }
        }
        return Counter;
    }

Now my problem is, if a cell in the grid contains a comma, my check char method will still count these as delimiters so will return an error. As you can see in the code, I wrap all the values in " characters to 'escape' them. How simple would it be to ignore commas in values in my method? I assume I'll need to rewrite the method quite a lot.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

撩动你心 2024-10-27 23:15:36
var rx = new Regex("^  (  ( \"[^\"]*\" )  |  (  (?!$)[^\",]  )+  |  (?<1>,)  )*  $", RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);
var matches = rx.Matches("Hello,World,How,Are\nYou,Today,This,Is,\"A beautiful, world\",Hi!");

for (int i = 1; i < matches.Count; i++) {
    if (matches[i].Groups[1].Captures.Count != matches[i - 1].Groups[1].Captures.Count) {
        throw new Exception();
    }
}
var rx = new Regex("^  (  ( \"[^\"]*\" )  |  (  (?!$)[^\",]  )+  |  (?<1>,)  )*  $", RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);
var matches = rx.Matches("Hello,World,How,Are\nYou,Today,This,Is,\"A beautiful, world\",Hi!");

for (int i = 1; i < matches.Count; i++) {
    if (matches[i].Groups[1].Captures.Count != matches[i - 1].Groups[1].Captures.Count) {
        throw new Exception();
    }
}
风吹短裙飘 2024-10-27 23:15:36

您可以只使用匹配一项的正则表达式并计算行中匹配项的数量。此类正则表达式的示例如下:

var itemsRegex =
    new Regex(@"(?<=(^|[\" + separator + @"]))((?<item>[^""\" + separator +
        @"\n]*)|(?<item>""([^""]|"""")*""))(?=($|[\" + separator + @"]))");

You could just use a regular expression that matches one item and count the number of matches in your line. An example of such a regex is the following:

var itemsRegex =
    new Regex(@"(?<=(^|[\" + separator + @"]))((?<item>[^""\" + separator +
        @"\n]*)|(?<item>""([^""]|"""")*""))(?=($|[\" + separator + @"]))");
雪花飘飘的天空 2024-10-27 23:15:36

只需执行如下操作(假设您不想在字段内包含 " (否则需要一些额外的处理)):

protected static int CheckChar(string Input, char CharToCheck, char fieldDelimiter)
{
    int Counter = 0;
    bool inValue = false;
    foreach (char StringChar in Input)
    {
        if (StringChar == fieldDelimiter)
            inValue = !inValue;
        else if (!inValue && StringChar == CharToCheck)
            Counter++;
    }
    return Counter;
}

这将导致 inValue 在字段内为 true。例如 pass < code>'"' 作为 fieldDelimiter 来忽略 "..." 之间的所有内容。请注意,这不会处理转义的 " (如 ""\")。您必须自己添加此类处理。

Just do something like the following (assuming you don't want to have " inside your fields (otherwise these need some extra handling)):

protected static int CheckChar(string Input, char CharToCheck, char fieldDelimiter)
{
    int Counter = 0;
    bool inValue = false;
    foreach (char StringChar in Input)
    {
        if (StringChar == fieldDelimiter)
            inValue = !inValue;
        else if (!inValue && StringChar == CharToCheck)
            Counter++;
    }
    return Counter;
}

This will cause inValue to be true while inside fields. E.g. pass '"' as fieldDelimiter to ignore everything between "...". Just note that this won't handle escaped " (like "" or \"). You'd have to add such handling yourself.

孤独患者 2024-10-27 23:15:36

您应该在连接(混合)字段(成分)之前检查字段(成分),而不是检查结果字符串(蛋糕)。这将使您能够做出一些建设性的事情(转义/替换)并仅作为最后的手段抛出异常。

一般来说,“,”在 .csv 字段中是合法的,只要字符串字段被引用即可。因此内部“,”应该不是问题,但引号很可能是问题。

Instead of checking the resulting string (the cake) you should check the fields (ingredients) before you concatenate (mix) them. That would give you the change to do something constructive (escaping/replacing) and throwing an exception only as a last resort.

In general, "," are legal in .csv fields, as long as the string fields are quoted. So internal "," should not be a problem, but the quotes may well be.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文