使用可选文本限定符批量插入

发布于 2024-08-04 01:16:02 字数 599 浏览 7 评论 0原文

我正在使用批量插入将 csv 导入到数据库。 这是逗号分隔的 csv 文件。所有字段都没有文本限定符。

但某些字段可能包含逗号作为数据的一部分。 例如,ADDRESS 字段值。这些值用双引号括起来。 仅当字段值中包含逗号时,才会出现这些双引号,否则值不会用双引号引起来。 因此,在某些行中,ADDRESS 值用双引号括起来,但在其他行中则没有。 有没有办法在批量插入命令中指定文本限定符?

我尝试使用格式文件选项进行批量插入。

BULK INSERT Test_Imported FROM 'C:\test.csv' 
WITH (FIRSTROW=0,FIELDTERMINATOR = ',',ROWTERMINATOR = '\n',FORMATFILE = 'C:\test.Fmt')

但我无法将双引号作为格式文件中的可选文本限定符。

附: 这个函数实际上是更大模块的一部分,它是用 C# 编写的。 从 C# 调用批量插入命令。

csv 文件是通过电子邮件从另一个自动化系统发送的。我无法控制格式 csv 文件的内容。大约有 150 列。 每个 csv 文件平均包含 12000 行。 忘记指定数据库。它是 SQL Server 2005。

I am importing csv to db using bulk insert.
It is the comma delimited csv file. No text qualifiers for all fields.

But some fields may have comma as part of the data.
for eg, ADDRESS field value. Those values are surronded with double quotes.
Those double quotes appear only if the field value has comma in it otherwise values are not surronded with double quotes.
So in some rows ADDRESS values are surronded with double-quotes, but in other rows they are not.
Is there a way to specify the text-qualifier in the bulk insert command?

I tried bulk insert with format file option.

BULK INSERT Test_Imported FROM 'C:\test.csv' 
WITH (FIRSTROW=0,FIELDTERMINATOR = ',',ROWTERMINATOR = '\n',FORMATFILE = 'C:\test.Fmt')

but there is no way i can mention the double quotes as optional text qualifiers in the format file.

PS:
this function is actually a part of the bigger module, which is written in c#.
bulk insert command is called from c#.

The csv file is coming by email from another automated system. i have no control over the format
of the csv file.There are around 150 columns.
In average 12000 rows are coming in each csv file.
Forgot to spcify the DB. It is SQL server 2005.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

沉默的熊 2024-08-11 01:16:02

不幸的是,您必须预处理文件以使其保持一致。
SQL 批量操作在字段分隔符上拆分字符串。

一些选项:

  • 在 C# 中处理,将未用引号引起来的逗号更改为管道 (|)
  • 将文件分为 2:“和非”文件。仅当同一字段具有“

您说您无法控制格式,但您拥有的内容无法使用时,这才有效......

Unfortunately, you'll have to pre-process the file to make it consistent.
SQL bulk operations split the string on the field delimiter.

Some options:

  • Process in c# to change commas not surrounded by quotes to pipe (|)
  • Break the file in 2: " and non-" files. This works only if the same field has "

You say you have no control over the format, but what you have is unusable...

少女情怀诗 2024-08-11 01:16:02

Bulk Insert 语句确实很糟糕,因为它不处理可选限定符。

TextFieldParser 类可以帮助我们清理文件 (Microsoft.VisualBasic.FileIO.TextFieldParser)

我已粘贴到使用 TextFieldParser 的函数中类来清理分隔文件,以便您可以在批量插入语句中使用它。

String newDel = CleanDelimitedFile("c:\temp.csv",new String[] {","},"\t,\t");

这是一个可以清理分隔文件的函数。

    /// <summary>
    /// This function opens a delimited file and cleans up any string quantifiers
    /// </summary>
    /// <param name="FileFullPath">Full path of the delimited string</param>
    /// <param name="CurrentDelimiter">What string / character the file uses as the delimiter</param>
    /// <param name="NewDelimiter">What new delimiter string to use</param>
    /// <returns>Returns String representation of the new delimited file</returns>
    private static String CleanDelimitedFile(String FileFullPath, String[] CurrentDelimiter, String NewDelimiter) {

        //-- if the file exists stream it to host
        if (System.IO.File.Exists( FileFullPath )) {
            Microsoft.VisualBasic.FileIO.TextFieldParser cvsParser = null;
            System.Text.StringBuilder parseResults = new System.Text.StringBuilder();
            try {
                // new parser
                cvsParser = new Microsoft.VisualBasic.FileIO.TextFieldParser(FileFullPath);
                // delimited file has certain fields enclosed in quotes
                cvsParser.HasFieldsEnclosedInQuotes = true;
                // the current delimiter
                cvsParser.Delimiters = CurrentDelimiter;
                // iterate through all the lines of the file
                Boolean FirstLine = true;
                while (!cvsParser.EndOfData ) {
                    if (FirstLine) {
                        FirstLine = false;
                    }
                    else {
                      parseResults.Append("\n");  
                    }
                    Boolean FirstField = true;
                    // iterate through each field
                    foreach (String item in cvsParser.ReadFields()) {
                        if (FirstField) {
                            parseResults.Append(item);
                            FirstField = false;
                        } 
                        else {
                            parseResults.Append(NewDelimiter + item);
                        }
                    }

                }
                return parseResults.ToString();
            }
            finally {
                if (cvsParser != null) {
                    cvsParser.Close();
                    cvsParser.Dispose();
                }
            }
        }
        return String.Empty;
    }

the Bulk Insert statement really sucks because it doesn't handle optional qualifiers.

The TextFieldParser class can help us clean up the file (Microsoft.VisualBasic.FileIO.TextFieldParser)

I have pasted in a function that uses the TextFieldParser class to clean up a delimited file so you can use it in a Bulk Insert statement.

String newDel = CleanDelimitedFile("c:\temp.csv",new String[] {","},"\t,\t");

Here is a function that will clean up your Delimited file.

    /// <summary>
    /// This function opens a delimited file and cleans up any string quantifiers
    /// </summary>
    /// <param name="FileFullPath">Full path of the delimited string</param>
    /// <param name="CurrentDelimiter">What string / character the file uses as the delimiter</param>
    /// <param name="NewDelimiter">What new delimiter string to use</param>
    /// <returns>Returns String representation of the new delimited file</returns>
    private static String CleanDelimitedFile(String FileFullPath, String[] CurrentDelimiter, String NewDelimiter) {

        //-- if the file exists stream it to host
        if (System.IO.File.Exists( FileFullPath )) {
            Microsoft.VisualBasic.FileIO.TextFieldParser cvsParser = null;
            System.Text.StringBuilder parseResults = new System.Text.StringBuilder();
            try {
                // new parser
                cvsParser = new Microsoft.VisualBasic.FileIO.TextFieldParser(FileFullPath);
                // delimited file has certain fields enclosed in quotes
                cvsParser.HasFieldsEnclosedInQuotes = true;
                // the current delimiter
                cvsParser.Delimiters = CurrentDelimiter;
                // iterate through all the lines of the file
                Boolean FirstLine = true;
                while (!cvsParser.EndOfData ) {
                    if (FirstLine) {
                        FirstLine = false;
                    }
                    else {
                      parseResults.Append("\n");  
                    }
                    Boolean FirstField = true;
                    // iterate through each field
                    foreach (String item in cvsParser.ReadFields()) {
                        if (FirstField) {
                            parseResults.Append(item);
                            FirstField = false;
                        } 
                        else {
                            parseResults.Append(NewDelimiter + item);
                        }
                    }

                }
                return parseResults.ToString();
            }
            finally {
                if (cvsParser != null) {
                    cvsParser.Close();
                    cvsParser.Dispose();
                }
            }
        }
        return String.Empty;
    }
飞烟轻若梦 2024-08-11 01:16:02

遗憾的是,SQL 2005 和 2008 导入 XLS 文件比 CSV 文件顺利得多。我从来不反对微软,但除非所有数据库管理的 ANSI 标准都发生了巨大的变化,并且文本限定符的概念被放弃(我对此非常怀疑),否则这可能是微软的专有举措。 SQL 2000 可以很好地处理文本限定符(不确定 BULK 命令,因为我一直只使用导入向导)。想象一下,当我们迁移到 2005 年时,我不得不重新设计所有流程,不导入平面文件,而是导入 XLS,我会感到多么惊讶。我只花了 16 个小时(是的,两个工作日)就得出了这个结论,那周我实际上失眠了,因为我对 MS 不允许使用文本限定符感到非常沮丧(我什至走进我老板的办公室道歉)花了这么多时间在本应 10 分钟的任务上)。具有讽刺意味的是,您无法告诉 Excel 导出任何不包含双引号文本限定符(或几乎任何其他与此相关的软件导出器)的内容。咕噜咕噜。

其中最令人沮丧的部分是 SQL 2005 导入向导有一个地方可以定义文本限定符!

...我敢说,在这次经历之后,我开始理解所有反 M$ 的言论了!

Sadly, SQL 2005 and 2008 import XLS files much more smoothly than CSV files. I've never been anti-Microsoft but unless all the ANSI standards of database management are dramatically changing and the concept of a text qualifier is being abandoned (which I highly doubt), then this is probably a proprietary move by MS. SQL 2000 handled text qualifiers just fine (not sure about the BULK command as I've always just used the Import Wizards). Imagine my surprise when we migrated to 2005 and I had to rework all of my processes to NOT import flat files but instead import XLS. It only took me 16 hours (yes, TWO work days) to come to that conclusion and I actually lost sleep that week because I was so frustrated with MS for not allowing the use of Text Qualifiers (I even went into my bosses office to apologize for spending so much time on what should have been a 10 minute task). Ironically, you can't tell Excel to export anything withOUT including a double-quoted text-qualifier (or virtually any other software exporters for that matter). GRRRRRR.

The most frustrating part of all of this is that the SQL 2005 import wizard has a place to define the text qualifer!

...dare I say I'm starting to understand all the anti-M$ rhetoric after this experience!

叫思念不要吵 2024-08-11 01:16:02

我知道这是一个老问题,但我有一个 TSQL 方法来处理间歇性引号分隔符。它可能不太漂亮,但它可能会对在这里找到方法的人有所帮助:

  1. 导入文本文件,其中每一行都在单列(一个字段)中。
  2. 使用下面的更新语句将引号之间的逗号更改为一些可识别的字符串,在本例中 *&*
  3. 使用另一个更新语句删除所有引号。
  4. 使用 bcp 将数据导出到新的 CSV 文件中。
  5. 使用新 CSV 文件中的所有字段将批量导入到原始表中:现在没有引号,字段内逗号改为 &,因此简单的逗号分隔导入将工作。
  6. 使用另一个更新语句将 & 改回逗号。

更新初始表
设置批量列 =
替换(批量列,
子字符串(批量列,
CHARINDEX('"', BulkColumn, 0),
CHARINDEX('"', BulkColumn, CHARINDEX('"', BulkColumn, 0) + 1) - CHARINDEX('"', BulkColumn, 0) + 1),
代替(
子字符串(批量列,
CHARINDEX('"', BulkColumn, 0),
CHARINDEX('"', BulkColumn, CHARINDEX('"', BulkColumn, 0) + 1) - CHARINDEX('"', BulkColumn, 0) + 1),
',',
'*&*'))
WHERE BulkColumn LIKE '%"%'

I know this is an old question, but I have a TSQL method for dealing with intermittent quote delimiters. It may not be pretty, but it may help someone who finds there way here:

  1. Import the text file with each line in a single column - one field.
  2. Use the update statement below to change the commas that are between quotation marks into some identifiable string, in this case *&*
  3. Use another update statement to strip all quotation marks.
  4. Use bcp to export the data into a new CSV file.
  5. Perform your bulk import into the original table with all the fields from the new CSV file: now there are no quotation marks and the in-field commas are & instead, so a simple comma-delimited import will work.
  6. Use another update statement to change & back into a comma.

UPDATE InitialTable
SET BulkColumn =
REPLACE(BulkColumn,
SubString(BulkColumn,
CHARINDEX('"', BulkColumn, 0),
CHARINDEX('"', BulkColumn, CHARINDEX('"', BulkColumn, 0) + 1) - CHARINDEX('"', BulkColumn, 0) + 1),
REPLACE(
SubString(BulkColumn,
CHARINDEX('"', BulkColumn, 0),
CHARINDEX('"', BulkColumn, CHARINDEX('"', BulkColumn, 0) + 1) - CHARINDEX('"', BulkColumn, 0) + 1),
',',
'*&*'))
WHERE BulkColumn LIKE '%"%'

薄荷→糖丶微凉 2024-08-11 01:16:02

对我有用的是改变

ROWTERMINATOR = '\n'

为:

ROWTERMINATOR = '0x0a'

What worked for me was changing

ROWTERMINATOR = '\n'

To:

ROWTERMINATOR = '0x0a'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文