SQL Server 中的动态 CSV 加载
好吧,我一直在互联网上寻找解决方案,但还没有想出任何办法。
我拥有的是一个 CSV - 这个 CSV 可能有任意数量的未知列,
例如
Col 1, Col 2, Col 3
我已经使用 BULK INSERT #temp FROM ... 从 CSV 插入,但这依赖于我事先有一个表来加载 - 这就是问题出现的地方 - 我不知道我的加载 CSV 之前的表结构
有没有一种方法可以基于 CSV 动态创建表以将数据加载到其中?
谢谢 抢
Ok so I have been searching the internet for a solution but have not yet come up with anything yet
What I have is a CSV - this CSV could have any number of unknown columns
e.g.
Col 1, Col 2, Col 3
I have used BULK INSERT #temp FROM ...
to insert from a CSV but this relies on me having a table before hand to load into - This is where the problem arises - I don’t know my table structure before loading the CSV
Is there a way to dynamically create the table, based on the CSV, on the fly to load the data into?
Thanks
Rob
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
CSV 解析并非易事(考虑到文本限定符、包含换行符的值、限定符转义机制等)。有几个 .Net 库可以为您完成所有这些工作(例如 http:// www.codeproject.com/KB/database/CsvReader.aspx),所以我认为使用不同的技术(例如 powershell 或 SQL CLR)来利用现有库会更容易 - 而不是尝试在 T-SQL 中推出您自己的 CSV 解析器...
呵呵,刚刚在旧论坛帖子 (http://forums.databasejournal.com/showthread.php?t=47966) 上找到了这个不错且简单的解决方案:
不幸的是,它不适用于默认情况下未安装文本驱动程序的最新 Windows 版本...
CSV parsing is non-trivial (taking into account text qualifiers, values that contain linebreaks, qualifier escape mechanisms, etc). There are several .Net libraries out there that do all this stuff for you (eg http://www.codeproject.com/KB/database/CsvReader.aspx), so I would think it would be easier to use a different technology, eg powershell, or SQL CLR, to make use of an existing library - rather that trying to roll your own CSV parser in T-SQL...
Huh, just found this nice and simple solution on an old forum post (http://forums.databasejournal.com/showthread.php?t=47966):
Unfortunately, it doesn't work on recent windows versions where the text driver isn't installed by default...
我多次面临同样的任务。我最终做的是为加载编写一个简单的 C# 脚本。我可以承认,每次我都必须稍微更改脚本,因为每次要求都不同,CSV 文件具有特定的特性等。这意味着我的代码很可能不会立即为您工作,但是我希望它能对你有很大帮助。
主要的 C# 文件是program.cs。这是它的来源:
该文件使用我在互联网上找到的库来解析CSV文件。请注意,我看到了有效的 CSV 文件,但该库无法解析。 CsvReader.cs 文件的文本如下:
我还有一个配置文件 CsvToSql.exe.config:
以及一个编译整个 build.cmd 的脚本:
这就是我使用它的方式:
*.tbl 文件是表定义,*.inp 文件是 bcp 命令行实用程序的输入文件,*.cmd 文件是运行表创建脚本和 bcp 命令行实用程序的文件。对所有表运行 *.cmd 的 _all.cmd 和删除 CsvToSql.exe 生成的所有文件的 _cleanup.cmd
该脚本做出了很多假设,并且还有很多硬编码的内容。这是我每次需要将一组 CSV 加载到 SQL 时通常会快速更改的内容。
祝您好运,如果您有任何疑问,请随时询问。
该脚本需要.NET 3.5
如果我正在加载的数据没有什么特别之处,我通常会在 15 分钟内启动并运行该脚本。如果出现问题,抽动可能需要更长时间。
I was faced with the same tasks many many times. What I ended up doing is writing a simple c# script for the load. I can admit, each time I had to change the script a little bit, because each time the requirements were different, the CSV file had specific peculiarities, etc. This means that my code most likely won't work for you straight away, but I hope that it can help you a lot.
The main C# file is program.cs. Here is its source:
This file use a library that I found in internet for parsing CSV file. Note, that I saw valid CSV files, that this library failed to parse. The text for CsvReader.cs file follows:
I also have a config file CsvToSql.exe.config:
And a script that compiles the whole lot build.cmd:
This is how I use it:
*.tbl files are table definitions, *.inp files are input files for bcp command line utility, *.cmd files are files that run table creation scripts and bcp command line utility. _all.cmd that runs *.cmd for all tables and _cleanup.cmd that deletes all the files that CsvToSql.exe generates
There are a lot of assumtions that this script makes, and also a lot of stuff that is hardcoded. This is what I usaully quickly change each new time I need to load a set of CSV into SQL.
Good luck and if you have any questions please don't hesitate to ask.
The script requires .NET 3.5
If there is no extra-special about data I'm loading, I'm usually up and running with this script in 15 minutes. If there are troubles, twicking might take longer.