SQL Server 中的动态 CSV 加载

发布于 2024-11-04 22:21:39 字数 289 浏览 0 评论 0原文

好吧,我一直在互联网上寻找解决方案,但还没有想出任何办法。

我拥有的是一个 CSV - 这个 CSV 可能有任意数量的未知列,

例如

Col 1, Col 2, Col 3

我已经使用 BULK INSERT #temp FROM ... 从 CSV 插入,但这依赖于我事先有一个表来加载 - 这就是问题出现的地方 - 我不知道我的加载 CSV 之前的表结构

有没有一种方法可以基于 CSV 动态创建表以将数据加载到其中?

谢谢 抢

Ok so I have been searching the internet for a solution but have not yet come up with anything yet

What I have is a CSV - this CSV could have any number of unknown columns

e.g.

Col 1, Col 2, Col 3

I have used BULK INSERT #temp FROM ... to insert from a CSV but this relies on me having a table before hand to load into - This is where the problem arises - I don’t know my table structure before loading the CSV

Is there a way to dynamically create the table, based on the CSV, on the fly to load the data into?

Thanks
Rob

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

吐个泡泡 2024-11-11 22:21:39

CSV 解析并非易事(考虑到文本限定符、包含换行符的值、限定符转义机制等)。有几个 .Net 库可以为您完成所有这些工作(例如 http:// www.codeproject.com/KB/database/CsvReader.aspx),所以我认为使用不同的技术(例如 powershell 或 SQL CLR)来利用现有库会更容易 - 而不是尝试在 T-SQL 中推出您自己的 CSV 解析器...

呵呵,刚刚在旧论坛帖子 (http://forums.databasejournal.com/showthread.php?t=47966) 上找到了这个不错且简单的解决方案:

select * 
from OpenRowset('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=D:\;',
'select * from test1.csv')

不幸的是,它不适用于默认情况下未安装文本驱动程序的最新 Windows 版本...

CSV parsing is non-trivial (taking into account text qualifiers, values that contain linebreaks, qualifier escape mechanisms, etc). There are several .Net libraries out there that do all this stuff for you (eg http://www.codeproject.com/KB/database/CsvReader.aspx), so I would think it would be easier to use a different technology, eg powershell, or SQL CLR, to make use of an existing library - rather that trying to roll your own CSV parser in T-SQL...

Huh, just found this nice and simple solution on an old forum post (http://forums.databasejournal.com/showthread.php?t=47966):

select * 
from OpenRowset('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=D:\;',
'select * from test1.csv')

Unfortunately, it doesn't work on recent windows versions where the text driver isn't installed by default...

忘年祭陌 2024-11-11 22:21:39

我多次面临同样的任务。我最终做的是为加载编写一个简单的 C# 脚本。我可以承认,每次我都必须稍微更改脚本,因为每次要求都不同,CSV 文件具有特定的特性等。这意味着我的代码很可能不会立即为您工作,但是我希望它能对你有很大帮助。

主要的 C# 文件是program.cs。这是它的来源:

using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Collections;

namespace CsvToSql
{
    class Program
    {
        static string server = @"localhost";
        static string database = @"test";
        static bool hasHeaders = false;
        static string fieldLength = "max";

        static string fieldPattern = "[%fieldName%] [nvarchar](%fieldLength%) NULL,\n";        
        static string createTablePattern = 
@"
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[%tableName%]') AND type in (N'U'))
DROP TABLE [dbo].[%tableName%]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[%tableName%](
%fields%
) ON [PRIMARY]
";
        static string commandScriptPattern = 
@"sqlcmd -S %server% -E -d %database% -i %tableScriptName% 
bcp %database%.dbo.%tableName% in %headrelsessFileName%  -c -t^^ -r \n -T -S %server%
";
        private static void Main(string[] args)
        {
            server = System.Configuration.ConfigurationSettings.AppSettings["server"] ?? server;
            database = System.Configuration.ConfigurationSettings.AppSettings["database"] ?? database;
            hasHeaders = System.Configuration.ConfigurationSettings.AppSettings["hasHeaders"] == "true";
            fieldLength = System.Configuration.ConfigurationSettings.AppSettings["fieldLength"] ?? fieldLength;

            string[] fileNames = Directory.GetFiles(".", "*.csv");
            foreach (string fileName in fileNames)
            {
                Console.WriteLine("Processing {0}", fileName);
                Process(fileName);
            }
            WriteExecuteAllFile(fileNames);
            WriteCleanUpFile(fileNames);
        }

        private static void Process(string fileName)
        {
            string[] fieldNames = ReadHeaders(fileName);
            ProduceTableScript(fileName, fieldNames);
            ProduceCommandScript(fileName);
        }

        private static void WriteExecuteAllFile(string[] fileNames)
        {
            StringBuilder sb = new StringBuilder();
            foreach (string fileName in fileNames)
            {
                sb.Append("call ");
                sb.AppendLine(GetCommandScriptName(fileName));
            }
            SaveStringToFile(sb.ToString(), "_all.cmd");
        }

        private static void WriteCleanUpFile(string[] fileNames)
        {
            StringBuilder sb = new StringBuilder();
            foreach (string fileName in fileNames)
            {
                sb.Append("del ");
                sb.AppendLine(GetCommandScriptName(fileName));
                sb.Append("del ");
                sb.AppendLine(GetHeaderlessFileName(fileName));
                sb.Append("del ");
                sb.AppendLine(GetTableScriptName(fileName));
            }
            sb.AppendLine("del _all.cmd");
            sb.AppendLine("del _cleanup.cmd");
            SaveStringToFile(sb.ToString(), "_cleanup.cmd");
        }

        private static string[] ReadHeaders(string fileName)
        {            
            using (FileStream fs = File.OpenRead(fileName))
            using (StreamReader sr = new StreamReader(fs))
            {
                if (hasHeaders)
                {
                        string[] result = ParseQutationLineToList(sr.ReadLine());
                        ProduceHeaderlessFile(sr, fs.Name);
                        return result;
                }
                else
                {
                    string s = sr.ReadLine();
                    string[] fields = ParseQutationLineToList(s);
                    fs.Seek(0, SeekOrigin.Begin);
                    sr.DiscardBufferedData();
                    string[] result = new string[fields.Length];
                    for (int i = 0; i < fields.Length; i++)
                    {
                        result[i] = "F" + (i + 1).ToString();
                    }
                    ProduceHeaderlessFile(sr, fs.Name);
                    return result;
                }
           }
        }

        private static void ProduceTableScript(string fileName, string[] fieldNames)
        {
            string tableName = GetTableName(fileName);
            string fields = fieldNames.Aggregate("", (s, i) => s + fieldPattern.Replace("%fieldName%", i).Replace("%fieldLength%", fieldLength));
            string table = createTablePattern.Replace("%fields%", fields).Replace("%tableName%", tableName);
            SaveStringToFile(table, GetTableScriptName(fileName));
        }


        private static void ProduceCommandScript(string fileName)
        {
            string content = commandScriptPattern;
            content = content.Replace("%server%", server);
            content = content.Replace("%database%", database);
            content = content.Replace("%tableName%", GetTableName(fileName));
            content = content.Replace("%tableScriptName%", GetTableScriptName(fileName));
            content = content.Replace("%headrelsessFileName%", GetHeaderlessFileName(fileName));
            SaveStringToFile(content, GetCommandScriptName(fileName));
        }

        private static void ProduceHeaderlessFile(StreamReader sr, string basefileName)
        {
            string headerlessFileName = GetHeaderlessFileName(basefileName);
            if (File.Exists(headerlessFileName))
            {
                return;
            }

            int counter = 0;

            using(FileStream fs = File.Open(headerlessFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
            using (StreamWriter sw = new StreamWriter(fs))
            {
                while(!sr.EndOfStream)
                {
                    //sw.WriteLine(sr.ReadLine().Replace("\"", ""));
                    sw.WriteLine(ParseLine(sr.ReadLine()));
                    counter++;
                }
                sw.Flush();
                fs.Flush();
            }

            Console.WriteLine("Written {0} records to {1}", counter, headerlessFileName);
        }

        private static string ParseLine(string s)
        {
            if (s.TrimStart(' ', '\t').StartsWith("\""))
            {
                return ParseQutationLine(s);
            }

            return s.Replace(',', '^');
        }

        // Some tables has the default field terminator (comma) inside them
        // this is why we have to parse
        private static string ParseQutationLine(string s)
        {

            string[] fields = ParseQutationLineToList(s);

            StringBuilder sb = new StringBuilder();

            foreach (string field in fields)
            {
                sb.Append(field.Trim('"'));
                sb.Append('^');
                if (field.IndexOf('^') >= 0)
                {
                    throw new ApplicationException("String contains separator character. " + s);
                }
            }

            return sb.ToString().Substring(0, sb.Length - 1);
        }

        private static string[] ParseQutationLineToList(string s)
        {
            JouniHeikniemi.Tools.Strings.CsvReader cr = new JouniHeikniemi.Tools.Strings.CsvReader();
            ArrayList result = new ArrayList();
            cr.ParseCsvFields(result, s);
            return (string[])result.ToArray(typeof(string));
        }

        private static void SaveStringToFile(string s, string fileName)
        {
            using (FileStream fs = File.Open(fileName, FileMode.Create, FileAccess.Write, FileShare.Read))
            using (StreamWriter sw = new StreamWriter(fs))
            {
                sw.Write(s);
                sw.Flush();
                fs.Flush();
            }
        }

        private static string GetTableName(string fileName)
        {
            return "_" + Path.GetFileNameWithoutExtension(fileName).Replace('.', '_');
        }

        private static string GetHeaderlessFileName(string fileName)
        {
            return Path.ChangeExtension(fileName, "inp");
        }

        private static string GetTableScriptName(string fileName)
        {
            return Path.ChangeExtension(fileName, "tbl");
        }

        private static string GetCommandScriptName(string fileName)
        {
            return Path.ChangeExtension(fileName, "cmd");
        }
    }
}

该文件使用我在互联网上找到的库来解析CSV文件。请注意,我看到了有效的 CSV 文件,但该库无法解析。 CsvReader.cs 文件的文本如下:

using System;
using System.Collections;
using System.IO;
using System.Text;

namespace JouniHeikniemi.Tools.Strings {

  /// <summary>
  /// A data-reader style interface for reading Csv (and otherwise-char-separated) files.
  /// </summary>
  public class CsvReader : IDisposable {

    #region Private variables

    private Stream stream;
    private StreamReader reader;
    private char separator;

    #endregion

    #region Constructors

    public CsvReader() { separator = ','; }

      /// <summary>
    /// Creates a new Csv reader for the given stream.
    /// </summary>
    /// <param name="s">The stream to read the CSV from.</param>
    public CsvReader(Stream s) : this(s, null, ',') { }

    /// <summary>
    /// Creates a new reader for the given stream and separator.
    /// </summary>
    /// <param name="s">The stream to read the separator from.</param>
    /// <param name="separator">The field separator character</param>
    public CsvReader(Stream s, char separator) : this(s, null, separator) { }

    /// <summary>
    /// Creates a new Csv reader for the given stream and encoding.
    /// </summary>
    /// <param name="s">The stream to read the CSV from.</param>
    /// <param name="enc">The encoding used.</param>
    public CsvReader(Stream s, Encoding enc) : this(s, enc, ',') { }

    /// <summary>
    /// Creates a new reader for the given stream, encoding and separator character.
    /// </summary>
    /// <param name="s">The stream to read the data from.</param>
    /// <param name="enc">The encoding used.</param>
    /// <param name="separator">The separator character between the fields</param>
    public CsvReader(Stream s, Encoding enc, char separator) {

      this.separator = separator;
      this.stream = s;
      if (!s.CanRead) {
        throw new CsvReaderException("Could not read the given data stream!");
      }
      reader = (enc != null) ? new StreamReader(s, enc) : new StreamReader(s);
    }

    /// <summary>
    /// Creates a new Csv reader for the given text file path.
    /// </summary>
    /// <param name="filename">The name of the file to be read.</param>
    public CsvReader(string filename) : this(filename, null, ',') { }

    /// <summary>
    /// Creates a new reader for the given text file path and separator character.
    /// </summary>
    /// <param name="filename">The name of the file to be read.</param>
    /// <param name="separator">The field separator character</param>
    public CsvReader(string filename, char separator) : this(filename, null, separator) { }

    /// <summary>
    /// Creates a new Csv reader for the given text file path and encoding.
    /// </summary>
    /// <param name="filename">The name of the file to be read.</param>
    /// <param name="enc">The encoding used.</param>
    public CsvReader(string filename, Encoding enc) 
      : this(filename, enc, ',') { }

    /// <summary>
    /// Creates a new reader for the given text file path, encoding and field separator.
    /// </summary>
    /// <param name="filename">The name of the file to be read.</param>
    /// <param name="enc">The encoding used.</param>
    /// <param name="separator">The field separator character.</param>
    public CsvReader(string filename, Encoding enc, char separator) 
      : this(new FileStream(filename, FileMode.Open), enc, separator) { }

    #endregion

    #region Properties

    /// <summary>
    /// The separator character for the fields. Comma for normal CSV.
    /// </summary>
    public char Separator {
      get { return separator; }
      set { separator = value; }
    }

    #endregion

    #region Parsing

    /// <summary>
    /// Returns the fields for the next row of data (or null if at eof)
    /// </summary>
    /// <returns>A string array of fields or null if at the end of file.</returns>
    public string[] GetCsvLine() {

      string data = reader.ReadLine();
      if (data == null) return null;
      if (data.Length == 0) return new string[0];

      ArrayList result = new ArrayList();

      ParseCsvFields(result, data);

      return (string[])result.ToArray(typeof(string));
    }

    // Parses the fields and pushes the fields into the result arraylist
    public void ParseCsvFields(ArrayList result, string data) {

      int pos = -1;
      while (pos < data.Length)
        result.Add(ParseCsvField(data, ref pos));
    }

    // Parses the field at the given position of the data, modified pos to match
    // the first unparsed position and returns the parsed field
    private string ParseCsvField(string data, ref int startSeparatorPosition) {

      if (startSeparatorPosition == data.Length-1) {
        startSeparatorPosition++;
        // The last field is empty
        return "";
      }

      int fromPos = startSeparatorPosition + 1;

      // Determine if this is a quoted field
      if (data[fromPos] == '"') {
        // If we're at the end of the string, let's consider this a field that
        // only contains the quote
        if (fromPos == data.Length-1) {
          fromPos++;
          return "\"";
        }

        // Otherwise, return a string of appropriate length with double quotes collapsed
        // Note that FSQ returns data.Length if no single quote was found
        int nextSingleQuote = FindSingleQuote(data, fromPos+1);
        startSeparatorPosition = nextSingleQuote+1;
        return data.Substring(fromPos+1, nextSingleQuote-fromPos-1).Replace("\"\"", "\"");
      }

      // The field ends in the next separator or EOL
      int nextSeparator = data.IndexOf(separator, fromPos);
      if (nextSeparator == -1) {
        startSeparatorPosition = data.Length;
        return data.Substring(fromPos);
      }
      else {
        startSeparatorPosition = nextSeparator;
        return data.Substring(fromPos, nextSeparator - fromPos);
      }
    }

    // Returns the index of the next single quote mark in the string 
    // (starting from startFrom)
    private static int FindSingleQuote(string data, int startFrom) {

      int i = startFrom-1;
      while (++i < data.Length)
        if (data[i] == '"') {
          // If this is a double quote, bypass the chars
          if (i < data.Length-1 && data[i+1] == '"') {
            i++;
            continue;
          }
          else
            return i;
        }
      // If no quote found, return the end value of i (data.Length)
      return i;
    }

    #endregion


    /// <summary>
    /// Disposes the reader. The underlying stream is closed.
    /// </summary>
    public void Dispose() {
      // Closing the reader closes the underlying stream, too
      if (reader != null) reader.Close();
      else if (stream != null)
        stream.Close(); // In case we failed before the reader was constructed
      GC.SuppressFinalize(this);
    }
  }


  /// <summary>
  /// Exception class for CsvReader exceptions.
  /// </summary>
  [Serializable]
  public class CsvReaderException : ApplicationException { 

    /// <summary>
    /// Constructs a new CsvReaderException.
    /// </summary>
    public CsvReaderException() : this("The CSV Reader encountered an error.") { }

    /// <summary>
    /// Constructs a new exception with the given message.
    /// </summary>
    /// <param name="message">The exception message.</param>
    public CsvReaderException(string message) : base(message) { }

    /// <summary>
    /// Constructs a new exception with the given message and the inner exception.
    /// </summary>
    /// <param name="message">The exception message.</param>
    /// <param name="inner">Inner exception that caused this issue.</param>
    public CsvReaderException(string message, Exception inner) : base(message, inner) { }

    /// <summary>
    /// Constructs a new exception with the given serialization information.
    /// </summary>
    /// <param name="info"></param>
    /// <param name="context"></param>
    protected CsvReaderException(System.Runtime.Serialization.SerializationInfo info, 
                                 System.Runtime.Serialization.StreamingContext context) 
      : base(info, context) { }

  }

}

我还有一个配置文件 CsvToSql.exe.config:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <appSettings>
    <add key="server" value="localhost"/>
    <add key="database" value="test"/>
    <!-- If your csv files have header, set this to true to generate field names from headers-->
    <!-- Otherwise set it to false to generate names F1, F2, F3, etc.-->
    <add key="hasHeaders" value="false"/>
    <!-- This is the lenght of nvarchar field created can be a number or 'max'-->
    <add key="fieldLength" value="500"/>
  </appSettings>
</configuration>

以及一个编译整个 build.cmd 的脚本:

%systemroot%\Microsoft.NET\Framework\v3.5\csc.exe /out:CsvToSql.exe Program.cs CsvReader.cs

这就是我使用它的方式:

  • 运行 build.cmd 来编译 CsvToSql.exe
  • 编辑CsvToSql.exe.config 以适合您的情况
  • 将 csv 文件放在可执行文件和配置文件的同一文件夹中
  • 运行 CsvToSql.exe
  • 可执行文件不会连接到 数据库。相反,它会生成许多文件:
    *.tbl 文件是表定义,*.inp 文件是 bcp 命令行实用程序的输入文件,*.cmd 文件是运行表创建脚本和 bcp 命令行实用程序的文件。对所有表运行 *.cmd 的 _all.cmd 和删除 CsvToSql.exe 生成的所有文件的 _cleanup.cmd
  • 运行 _all.cmd 文件
  • 转到 SQL 并查看生成的内容。对脚本和/或配置进行更改,冲洗并重复

该脚本做出了很多假设,并且还有很多硬编码的内容。这是我每次需要将一组 CSV 加载到 SQL 时通常会快速更改的内容。

祝您好运,如果您有任何疑问,请随时询问。
该脚本需要.NET 3.5
如果我正在加载的数据没有什么特别之处,我通常会在 15 分钟内启动并运行该脚本。如果出现问题,抽动可能需要更长时间。

I was faced with the same tasks many many times. What I ended up doing is writing a simple c# script for the load. I can admit, each time I had to change the script a little bit, because each time the requirements were different, the CSV file had specific peculiarities, etc. This means that my code most likely won't work for you straight away, but I hope that it can help you a lot.

The main C# file is program.cs. Here is its source:

using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Collections;

namespace CsvToSql
{
    class Program
    {
        static string server = @"localhost";
        static string database = @"test";
        static bool hasHeaders = false;
        static string fieldLength = "max";

        static string fieldPattern = "[%fieldName%] [nvarchar](%fieldLength%) NULL,\n";        
        static string createTablePattern = 
@"
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[%tableName%]') AND type in (N'U'))
DROP TABLE [dbo].[%tableName%]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[%tableName%](
%fields%
) ON [PRIMARY]
";
        static string commandScriptPattern = 
@"sqlcmd -S %server% -E -d %database% -i %tableScriptName% 
bcp %database%.dbo.%tableName% in %headrelsessFileName%  -c -t^^ -r \n -T -S %server%
";
        private static void Main(string[] args)
        {
            server = System.Configuration.ConfigurationSettings.AppSettings["server"] ?? server;
            database = System.Configuration.ConfigurationSettings.AppSettings["database"] ?? database;
            hasHeaders = System.Configuration.ConfigurationSettings.AppSettings["hasHeaders"] == "true";
            fieldLength = System.Configuration.ConfigurationSettings.AppSettings["fieldLength"] ?? fieldLength;

            string[] fileNames = Directory.GetFiles(".", "*.csv");
            foreach (string fileName in fileNames)
            {
                Console.WriteLine("Processing {0}", fileName);
                Process(fileName);
            }
            WriteExecuteAllFile(fileNames);
            WriteCleanUpFile(fileNames);
        }

        private static void Process(string fileName)
        {
            string[] fieldNames = ReadHeaders(fileName);
            ProduceTableScript(fileName, fieldNames);
            ProduceCommandScript(fileName);
        }

        private static void WriteExecuteAllFile(string[] fileNames)
        {
            StringBuilder sb = new StringBuilder();
            foreach (string fileName in fileNames)
            {
                sb.Append("call ");
                sb.AppendLine(GetCommandScriptName(fileName));
            }
            SaveStringToFile(sb.ToString(), "_all.cmd");
        }

        private static void WriteCleanUpFile(string[] fileNames)
        {
            StringBuilder sb = new StringBuilder();
            foreach (string fileName in fileNames)
            {
                sb.Append("del ");
                sb.AppendLine(GetCommandScriptName(fileName));
                sb.Append("del ");
                sb.AppendLine(GetHeaderlessFileName(fileName));
                sb.Append("del ");
                sb.AppendLine(GetTableScriptName(fileName));
            }
            sb.AppendLine("del _all.cmd");
            sb.AppendLine("del _cleanup.cmd");
            SaveStringToFile(sb.ToString(), "_cleanup.cmd");
        }

        private static string[] ReadHeaders(string fileName)
        {            
            using (FileStream fs = File.OpenRead(fileName))
            using (StreamReader sr = new StreamReader(fs))
            {
                if (hasHeaders)
                {
                        string[] result = ParseQutationLineToList(sr.ReadLine());
                        ProduceHeaderlessFile(sr, fs.Name);
                        return result;
                }
                else
                {
                    string s = sr.ReadLine();
                    string[] fields = ParseQutationLineToList(s);
                    fs.Seek(0, SeekOrigin.Begin);
                    sr.DiscardBufferedData();
                    string[] result = new string[fields.Length];
                    for (int i = 0; i < fields.Length; i++)
                    {
                        result[i] = "F" + (i + 1).ToString();
                    }
                    ProduceHeaderlessFile(sr, fs.Name);
                    return result;
                }
           }
        }

        private static void ProduceTableScript(string fileName, string[] fieldNames)
        {
            string tableName = GetTableName(fileName);
            string fields = fieldNames.Aggregate("", (s, i) => s + fieldPattern.Replace("%fieldName%", i).Replace("%fieldLength%", fieldLength));
            string table = createTablePattern.Replace("%fields%", fields).Replace("%tableName%", tableName);
            SaveStringToFile(table, GetTableScriptName(fileName));
        }


        private static void ProduceCommandScript(string fileName)
        {
            string content = commandScriptPattern;
            content = content.Replace("%server%", server);
            content = content.Replace("%database%", database);
            content = content.Replace("%tableName%", GetTableName(fileName));
            content = content.Replace("%tableScriptName%", GetTableScriptName(fileName));
            content = content.Replace("%headrelsessFileName%", GetHeaderlessFileName(fileName));
            SaveStringToFile(content, GetCommandScriptName(fileName));
        }

        private static void ProduceHeaderlessFile(StreamReader sr, string basefileName)
        {
            string headerlessFileName = GetHeaderlessFileName(basefileName);
            if (File.Exists(headerlessFileName))
            {
                return;
            }

            int counter = 0;

            using(FileStream fs = File.Open(headerlessFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
            using (StreamWriter sw = new StreamWriter(fs))
            {
                while(!sr.EndOfStream)
                {
                    //sw.WriteLine(sr.ReadLine().Replace("\"", ""));
                    sw.WriteLine(ParseLine(sr.ReadLine()));
                    counter++;
                }
                sw.Flush();
                fs.Flush();
            }

            Console.WriteLine("Written {0} records to {1}", counter, headerlessFileName);
        }

        private static string ParseLine(string s)
        {
            if (s.TrimStart(' ', '\t').StartsWith("\""))
            {
                return ParseQutationLine(s);
            }

            return s.Replace(',', '^');
        }

        // Some tables has the default field terminator (comma) inside them
        // this is why we have to parse
        private static string ParseQutationLine(string s)
        {

            string[] fields = ParseQutationLineToList(s);

            StringBuilder sb = new StringBuilder();

            foreach (string field in fields)
            {
                sb.Append(field.Trim('"'));
                sb.Append('^');
                if (field.IndexOf('^') >= 0)
                {
                    throw new ApplicationException("String contains separator character. " + s);
                }
            }

            return sb.ToString().Substring(0, sb.Length - 1);
        }

        private static string[] ParseQutationLineToList(string s)
        {
            JouniHeikniemi.Tools.Strings.CsvReader cr = new JouniHeikniemi.Tools.Strings.CsvReader();
            ArrayList result = new ArrayList();
            cr.ParseCsvFields(result, s);
            return (string[])result.ToArray(typeof(string));
        }

        private static void SaveStringToFile(string s, string fileName)
        {
            using (FileStream fs = File.Open(fileName, FileMode.Create, FileAccess.Write, FileShare.Read))
            using (StreamWriter sw = new StreamWriter(fs))
            {
                sw.Write(s);
                sw.Flush();
                fs.Flush();
            }
        }

        private static string GetTableName(string fileName)
        {
            return "_" + Path.GetFileNameWithoutExtension(fileName).Replace('.', '_');
        }

        private static string GetHeaderlessFileName(string fileName)
        {
            return Path.ChangeExtension(fileName, "inp");
        }

        private static string GetTableScriptName(string fileName)
        {
            return Path.ChangeExtension(fileName, "tbl");
        }

        private static string GetCommandScriptName(string fileName)
        {
            return Path.ChangeExtension(fileName, "cmd");
        }
    }
}

This file use a library that I found in internet for parsing CSV file. Note, that I saw valid CSV files, that this library failed to parse. The text for CsvReader.cs file follows:

using System;
using System.Collections;
using System.IO;
using System.Text;

namespace JouniHeikniemi.Tools.Strings {

  /// <summary>
  /// A data-reader style interface for reading Csv (and otherwise-char-separated) files.
  /// </summary>
  public class CsvReader : IDisposable {

    #region Private variables

    private Stream stream;
    private StreamReader reader;
    private char separator;

    #endregion

    #region Constructors

    public CsvReader() { separator = ','; }

      /// <summary>
    /// Creates a new Csv reader for the given stream.
    /// </summary>
    /// <param name="s">The stream to read the CSV from.</param>
    public CsvReader(Stream s) : this(s, null, ',') { }

    /// <summary>
    /// Creates a new reader for the given stream and separator.
    /// </summary>
    /// <param name="s">The stream to read the separator from.</param>
    /// <param name="separator">The field separator character</param>
    public CsvReader(Stream s, char separator) : this(s, null, separator) { }

    /// <summary>
    /// Creates a new Csv reader for the given stream and encoding.
    /// </summary>
    /// <param name="s">The stream to read the CSV from.</param>
    /// <param name="enc">The encoding used.</param>
    public CsvReader(Stream s, Encoding enc) : this(s, enc, ',') { }

    /// <summary>
    /// Creates a new reader for the given stream, encoding and separator character.
    /// </summary>
    /// <param name="s">The stream to read the data from.</param>
    /// <param name="enc">The encoding used.</param>
    /// <param name="separator">The separator character between the fields</param>
    public CsvReader(Stream s, Encoding enc, char separator) {

      this.separator = separator;
      this.stream = s;
      if (!s.CanRead) {
        throw new CsvReaderException("Could not read the given data stream!");
      }
      reader = (enc != null) ? new StreamReader(s, enc) : new StreamReader(s);
    }

    /// <summary>
    /// Creates a new Csv reader for the given text file path.
    /// </summary>
    /// <param name="filename">The name of the file to be read.</param>
    public CsvReader(string filename) : this(filename, null, ',') { }

    /// <summary>
    /// Creates a new reader for the given text file path and separator character.
    /// </summary>
    /// <param name="filename">The name of the file to be read.</param>
    /// <param name="separator">The field separator character</param>
    public CsvReader(string filename, char separator) : this(filename, null, separator) { }

    /// <summary>
    /// Creates a new Csv reader for the given text file path and encoding.
    /// </summary>
    /// <param name="filename">The name of the file to be read.</param>
    /// <param name="enc">The encoding used.</param>
    public CsvReader(string filename, Encoding enc) 
      : this(filename, enc, ',') { }

    /// <summary>
    /// Creates a new reader for the given text file path, encoding and field separator.
    /// </summary>
    /// <param name="filename">The name of the file to be read.</param>
    /// <param name="enc">The encoding used.</param>
    /// <param name="separator">The field separator character.</param>
    public CsvReader(string filename, Encoding enc, char separator) 
      : this(new FileStream(filename, FileMode.Open), enc, separator) { }

    #endregion

    #region Properties

    /// <summary>
    /// The separator character for the fields. Comma for normal CSV.
    /// </summary>
    public char Separator {
      get { return separator; }
      set { separator = value; }
    }

    #endregion

    #region Parsing

    /// <summary>
    /// Returns the fields for the next row of data (or null if at eof)
    /// </summary>
    /// <returns>A string array of fields or null if at the end of file.</returns>
    public string[] GetCsvLine() {

      string data = reader.ReadLine();
      if (data == null) return null;
      if (data.Length == 0) return new string[0];

      ArrayList result = new ArrayList();

      ParseCsvFields(result, data);

      return (string[])result.ToArray(typeof(string));
    }

    // Parses the fields and pushes the fields into the result arraylist
    public void ParseCsvFields(ArrayList result, string data) {

      int pos = -1;
      while (pos < data.Length)
        result.Add(ParseCsvField(data, ref pos));
    }

    // Parses the field at the given position of the data, modified pos to match
    // the first unparsed position and returns the parsed field
    private string ParseCsvField(string data, ref int startSeparatorPosition) {

      if (startSeparatorPosition == data.Length-1) {
        startSeparatorPosition++;
        // The last field is empty
        return "";
      }

      int fromPos = startSeparatorPosition + 1;

      // Determine if this is a quoted field
      if (data[fromPos] == '"') {
        // If we're at the end of the string, let's consider this a field that
        // only contains the quote
        if (fromPos == data.Length-1) {
          fromPos++;
          return "\"";
        }

        // Otherwise, return a string of appropriate length with double quotes collapsed
        // Note that FSQ returns data.Length if no single quote was found
        int nextSingleQuote = FindSingleQuote(data, fromPos+1);
        startSeparatorPosition = nextSingleQuote+1;
        return data.Substring(fromPos+1, nextSingleQuote-fromPos-1).Replace("\"\"", "\"");
      }

      // The field ends in the next separator or EOL
      int nextSeparator = data.IndexOf(separator, fromPos);
      if (nextSeparator == -1) {
        startSeparatorPosition = data.Length;
        return data.Substring(fromPos);
      }
      else {
        startSeparatorPosition = nextSeparator;
        return data.Substring(fromPos, nextSeparator - fromPos);
      }
    }

    // Returns the index of the next single quote mark in the string 
    // (starting from startFrom)
    private static int FindSingleQuote(string data, int startFrom) {

      int i = startFrom-1;
      while (++i < data.Length)
        if (data[i] == '"') {
          // If this is a double quote, bypass the chars
          if (i < data.Length-1 && data[i+1] == '"') {
            i++;
            continue;
          }
          else
            return i;
        }
      // If no quote found, return the end value of i (data.Length)
      return i;
    }

    #endregion


    /// <summary>
    /// Disposes the reader. The underlying stream is closed.
    /// </summary>
    public void Dispose() {
      // Closing the reader closes the underlying stream, too
      if (reader != null) reader.Close();
      else if (stream != null)
        stream.Close(); // In case we failed before the reader was constructed
      GC.SuppressFinalize(this);
    }
  }


  /// <summary>
  /// Exception class for CsvReader exceptions.
  /// </summary>
  [Serializable]
  public class CsvReaderException : ApplicationException { 

    /// <summary>
    /// Constructs a new CsvReaderException.
    /// </summary>
    public CsvReaderException() : this("The CSV Reader encountered an error.") { }

    /// <summary>
    /// Constructs a new exception with the given message.
    /// </summary>
    /// <param name="message">The exception message.</param>
    public CsvReaderException(string message) : base(message) { }

    /// <summary>
    /// Constructs a new exception with the given message and the inner exception.
    /// </summary>
    /// <param name="message">The exception message.</param>
    /// <param name="inner">Inner exception that caused this issue.</param>
    public CsvReaderException(string message, Exception inner) : base(message, inner) { }

    /// <summary>
    /// Constructs a new exception with the given serialization information.
    /// </summary>
    /// <param name="info"></param>
    /// <param name="context"></param>
    protected CsvReaderException(System.Runtime.Serialization.SerializationInfo info, 
                                 System.Runtime.Serialization.StreamingContext context) 
      : base(info, context) { }

  }

}

I also have a config file CsvToSql.exe.config:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <appSettings>
    <add key="server" value="localhost"/>
    <add key="database" value="test"/>
    <!-- If your csv files have header, set this to true to generate field names from headers-->
    <!-- Otherwise set it to false to generate names F1, F2, F3, etc.-->
    <add key="hasHeaders" value="false"/>
    <!-- This is the lenght of nvarchar field created can be a number or 'max'-->
    <add key="fieldLength" value="500"/>
  </appSettings>
</configuration>

And a script that compiles the whole lot build.cmd:

%systemroot%\Microsoft.NET\Framework\v3.5\csc.exe /out:CsvToSql.exe Program.cs CsvReader.cs

This is how I use it:

  • Run build.cmd to compile CsvToSql.exe
  • Edit CsvToSql.exe.config to fit your case
  • Put csv files in the same folder the executable and the config file
  • Run CsvToSql.exe
  • The executable does not connect to the database. Instead it produces a number of files:
    *.tbl files are table definitions, *.inp files are input files for bcp command line utility, *.cmd files are files that run table creation scripts and bcp command line utility. _all.cmd that runs *.cmd for all tables and _cleanup.cmd that deletes all the files that CsvToSql.exe generates
  • Run _all.cmd file
  • Go to your SQL and look at what has been produced. Make changes to the script and / or config, rinse and repeat

There are a lot of assumtions that this script makes, and also a lot of stuff that is hardcoded. This is what I usaully quickly change each new time I need to load a set of CSV into SQL.

Good luck and if you have any questions please don't hesitate to ask.
The script requires .NET 3.5
If there is no extra-special about data I'm loading, I'm usually up and running with this script in 15 minutes. If there are troubles, twicking might take longer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文