Codegolf:用最少的 C# 代码将 csv 转换为 HTML 表格
我正在向我自己的个人工具包库添加一个函数来执行简单的 CSV 到 HTML 表的转换。
我希望用尽可能小的代码片段在 C# 中实现这一点,并且它需要能够处理超过 ~500mb 的 CSV 文件。
到目前为止,我的两个竞争者正在
- 将 csv 分割成数组 分隔符和构建 HTML 输出
用表格搜索替换分隔符 th tr td 标签
假设文件/读取/磁盘操作已处理...即,我将包含所述 CSV 内容的字符串传递到此函数中。 输出将由直接简单的 HTML 无样式标记组成,是的,数据中可能包含杂散逗号和中断。
更新:有些人问。 如果有帮助的话,我处理的 CSV 100% 直接来自 Excel。
示例字符串:
a1,b1,c1\r\n a2,b2,c2\r\n
I'm adding a function to my own personal toolkit lib to do simple CSV to HTML table conversion.
I would like the smallest possible piece of code to do this in C#, and it needs to be able to handle CSV files in excess of ~500mb.
So far my two contenders are
splitting csv into arrays by
delimiters and building HTML outputsearch-replace delimiters with table
th tr td tags
Assume that the file/read/disk operations are already handled... i.e., i'm passing a string containing the contents of said CSV into this function. The output will consist of straight up simple HTML style-free markup, and yes the data may have stray commas and breaks therein.
update: some folks asked. 100% of the CSV i deal with comes straight out of excel if that helps.
Example string:
a1,b1,c1\r\n a2,b2,c2\r\n
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
将所有行读入内存
或一次读取一行
... @Jimmy ...我使用 LINQ 创建了一个扩展版本。 这是亮点...(行读取的惰性评估)
Read All Lines into Memory
or Read one line at a time
... @Jimmy ... I created an extended version using LINQ. Here is the highlight ... (lazy eval for line reading)
可能不会比这短多少,但请记住,任何真正的解决方案都会处理引号、引号内的逗号以及到 html 实体的转换。
编辑:这里(基本上未经测试)添加了 htmlencode 和引号匹配。 我先进行 htmlencode,然后所有逗号都变成 '<' (它们不会发生冲突,因为现有的已经被编码了。
probably not much shorter you can get than this, but just remember that any real solution would handle quotes, commas inside of quotes, and conversions to html entities.
EDIT: here's (largely untested) addition of htmlencode and quote-matching. I htmlencode first, then all commas become '<' (which don't collide because the existing ones have been encoded already.
这是一个使用 lambda 表达式的有趣版本。 它并不像用
""
替换逗号那么短,但它有它自己的特殊魅力:如果我要在生产中纠正它,我会使用一个状态机器来跟踪嵌套的引号、换行符和逗号,因为 Excel 可以将换行符放在列的中间。 IIRC 您还可以完全指定不同的分隔符。
Here's a fun version using lambda expressions. It's not as short as replacing commas with
"</td><td>"
, but it has it's own special charm:If I were to right this for production, I'd use a state machine to track nested quotes, newlines, and commas, because excel can put new lines in the middle of column. IIRC you can also specify a different delimiter entirely.