Codegolf:用最少的 C# 代码将 csv 转换为 HTML 表格

发布于 2024-07-23 08:48:31 字数 482 浏览 6 评论 0原文

我正在向我自己的个人工具包库添加一个函数来执行简单的 CSV 到 HTML 表的转换。

我希望用尽可能小的代码片段在 C# 中实现这一点,并且它需要能够处理超过 ~500mb 的 CSV 文件。

到目前为止,我的两个竞争者正在

  • 将 csv 分割成数组 分隔符和构建 HTML 输出

  • 用表格搜索替换分隔符 th tr td 标签

假设文件/读取/磁盘操作已处理...即,我将包含所述 CSV 内容的字符串传递到此函数中。 输出将由直接简单的 HTML 无样式标记组成,是的,数据中可能包含杂散逗号和中断。

更新:有些人问。 如果有帮助的话,我处理的 CSV 100% 直接来自 Excel。

示例字符串:

a1,b1,c1\r\n
a2,b2,c2\r\n

I'm adding a function to my own personal toolkit lib to do simple CSV to HTML table conversion.

I would like the smallest possible piece of code to do this in C#, and it needs to be able to handle CSV files in excess of ~500mb.

So far my two contenders are

  • splitting csv into arrays by
    delimiters and building HTML output

  • search-replace delimiters with table
    th tr td tags

Assume that the file/read/disk operations are already handled... i.e., i'm passing a string containing the contents of said CSV into this function. The output will consist of straight up simple HTML style-free markup, and yes the data may have stray commas and breaks therein.

update: some folks asked. 100% of the CSV i deal with comes straight out of excel if that helps.

Example string:

a1,b1,c1\r\n
a2,b2,c2\r\n

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

温折酒 2024-07-30 08:48:31

将所有行读入内存

    var lines =File.ReadAllLines(args[0]);
    using (var outfs = File.AppendText(args[1]))
    {
        outfs.Write("<html><body><table>");
        foreach (var line in lines)
            outfs.Write("<tr><td>" + string.Join("</td><td>", line.Split(',')) + "</td></tr>");
        outfs.Write("</table></body></html>");
    }

或一次读取一行

    using (var inFs = File.OpenText(args[0]))
    using (var outfs = File.AppendText(args[1]))
    {
        outfs.Write("<html><body><table>");
        while (!inFs.EndOfStream )
            outfs.Write("<tr><td>" + string.Join("</td><td>", inFs.ReadLine().Split(',')) + "</td></tr>");
        outfs.Write("</table></body></html>");
    }

... @Jimmy ...我使用 LINQ 创建了一个扩展版本。 这是亮点...(行读取的惰性评估)

    using (var lp = args[0].Load())
        lp.Select(l => "<tr><td>" + string.Join("</td><td>", l.Split(',')) + "</td></tr>")
        .Write("<html><body><table>", "</table></body></html>", args[1]);

Read All Lines into Memory

    var lines =File.ReadAllLines(args[0]);
    using (var outfs = File.AppendText(args[1]))
    {
        outfs.Write("<html><body><table>");
        foreach (var line in lines)
            outfs.Write("<tr><td>" + string.Join("</td><td>", line.Split(',')) + "</td></tr>");
        outfs.Write("</table></body></html>");
    }

or Read one line at a time

    using (var inFs = File.OpenText(args[0]))
    using (var outfs = File.AppendText(args[1]))
    {
        outfs.Write("<html><body><table>");
        while (!inFs.EndOfStream )
            outfs.Write("<tr><td>" + string.Join("</td><td>", inFs.ReadLine().Split(',')) + "</td></tr>");
        outfs.Write("</table></body></html>");
    }

... @Jimmy ... I created an extended version using LINQ. Here is the highlight ... (lazy eval for line reading)

    using (var lp = args[0].Load())
        lp.Select(l => "<tr><td>" + string.Join("</td><td>", l.Split(',')) + "</td></tr>")
        .Write("<html><body><table>", "</table></body></html>", args[1]);
暮年慕年 2024-07-30 08:48:31

可能不会比这短多少,但请记住,任何真正的解决方案都会处理引号、引号内的逗号以及到 html 实体的转换。

return "<table><tr><td>"+s
   .Replace("\n","</td></tr><tr><td>")
   .Replace(",","</td><td>")+"</td></tr></table>";

编辑:这里(基本上未经测试)添加了 htmlencode 和引号匹配。 我先进行 htmlencode,然后所有逗号都变成 '<' (它们不会发生冲突,因为现有的已经被编码了。

bool q=false;
return "<table><tr><td>"
  + new string(HttpUtility.HtmlEncode(s)
       .Select(c=>c=='"'?(q=!q)?c:c:(c==','&&!q)?'<':c).ToArray())
    .Replace("<", "</td><td>")
    .Replace("\n", "</td></tr><tr><td>")
  + "</td></tr></table>";

probably not much shorter you can get than this, but just remember that any real solution would handle quotes, commas inside of quotes, and conversions to html entities.

return "<table><tr><td>"+s
   .Replace("\n","</td></tr><tr><td>")
   .Replace(",","</td><td>")+"</td></tr></table>";

EDIT: here's (largely untested) addition of htmlencode and quote-matching. I htmlencode first, then all commas become '<' (which don't collide because the existing ones have been encoded already.

bool q=false;
return "<table><tr><td>"
  + new string(HttpUtility.HtmlEncode(s)
       .Select(c=>c=='"'?(q=!q)?c:c:(c==','&&!q)?'<':c).ToArray())
    .Replace("<", "</td><td>")
    .Replace("\n", "</td></tr><tr><td>")
  + "</td></tr></table>";
时光倒影 2024-07-30 08:48:31

这是一个使用 lambda 表达式的有趣版本。 它并不像用 "" 替换逗号那么短,但它有它自己的特殊魅力:

var r = new StringBuilder("<table>");
s.Split('\n').ToList().ForEach(t => r.Append("<tr>").Append(t.Split(',').Select(u => "<td>" + u + "</td>")).Append("</tr>"));
return r.Append("</table>").ToString();

如果我要在生产中纠正它,我会使用一个状态机器来跟踪嵌套的引号、换行符和逗号,因为 Excel 可以将换行符放在列的中间。 IIRC 您还可以完全指定不同的分隔符。

Here's a fun version using lambda expressions. It's not as short as replacing commas with "</td><td>", but it has it's own special charm:

var r = new StringBuilder("<table>");
s.Split('\n').ToList().ForEach(t => r.Append("<tr>").Append(t.Split(',').Select(u => "<td>" + u + "</td>")).Append("</tr>"));
return r.Append("</table>").ToString();

If I were to right this for production, I'd use a state machine to track nested quotes, newlines, and commas, because excel can put new lines in the middle of column. IIRC you can also specify a different delimiter entirely.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文