使用可选文本限定符批量插入
我正在使用批量插入将 csv 导入到数据库。 这是逗号分隔的 csv 文件。所有字段都没有文本限定符。
但某些字段可能包含逗号作为数据的一部分。 例如,ADDRESS 字段值。这些值用双引号括起来。 仅当字段值中包含逗号时,才会出现这些双引号,否则值不会用双引号引起来。 因此,在某些行中,ADDRESS 值用双引号括起来,但在其他行中则没有。 有没有办法在批量插入命令中指定文本限定符?
我尝试使用格式文件选项进行批量插入。
BULK INSERT Test_Imported FROM 'C:\test.csv'
WITH (FIRSTROW=0,FIELDTERMINATOR = ',',ROWTERMINATOR = '\n',FORMATFILE = 'C:\test.Fmt')
但我无法将双引号作为格式文件中的可选文本限定符。
附: 这个函数实际上是更大模块的一部分,它是用 C# 编写的。 从 C# 调用批量插入命令。
csv 文件是通过电子邮件从另一个自动化系统发送的。我无法控制格式 csv 文件的内容。大约有 150 列。 每个 csv 文件平均包含 12000 行。 忘记指定数据库。它是 SQL Server 2005。
I am importing csv to db using bulk insert.
It is the comma delimited csv file. No text qualifiers for all fields.
But some fields may have comma as part of the data.
for eg, ADDRESS field value. Those values are surronded with double quotes.
Those double quotes appear only if the field value has comma in it otherwise values are not surronded with double quotes.
So in some rows ADDRESS values are surronded with double-quotes, but in other rows they are not.
Is there a way to specify the text-qualifier in the bulk insert command?
I tried bulk insert with format file option.
BULK INSERT Test_Imported FROM 'C:\test.csv'
WITH (FIRSTROW=0,FIELDTERMINATOR = ',',ROWTERMINATOR = '\n',FORMATFILE = 'C:\test.Fmt')
but there is no way i can mention the double quotes as optional text qualifiers in the format file.
PS:
this function is actually a part of the bigger module, which is written in c#.
bulk insert command is called from c#.
The csv file is coming by email from another automated system. i have no control over the format
of the csv file.There are around 150 columns.
In average 12000 rows are coming in each csv file.
Forgot to spcify the DB. It is SQL server 2005.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
不幸的是,您必须预处理文件以使其保持一致。
SQL 批量操作在字段分隔符上拆分字符串。
一些选项:
您说您无法控制格式,但您拥有的内容无法使用时,这才有效......
Unfortunately, you'll have to pre-process the file to make it consistent.
SQL bulk operations split the string on the field delimiter.
Some options:
You say you have no control over the format, but what you have is unusable...
Bulk Insert 语句确实很糟糕,因为它不处理可选限定符。
TextFieldParser 类可以帮助我们清理文件 (Microsoft.VisualBasic.FileIO.TextFieldParser)
我已粘贴到使用 TextFieldParser 的函数中类来清理分隔文件,以便您可以在批量插入语句中使用它。
这是一个可以清理分隔文件的函数。
the Bulk Insert statement really sucks because it doesn't handle optional qualifiers.
The TextFieldParser class can help us clean up the file (Microsoft.VisualBasic.FileIO.TextFieldParser)
I have pasted in a function that uses the TextFieldParser class to clean up a delimited file so you can use it in a Bulk Insert statement.
Here is a function that will clean up your Delimited file.
遗憾的是,SQL 2005 和 2008 导入 XLS 文件比 CSV 文件顺利得多。我从来不反对微软,但除非所有数据库管理的 ANSI 标准都发生了巨大的变化,并且文本限定符的概念被放弃(我对此非常怀疑),否则这可能是微软的专有举措。 SQL 2000 可以很好地处理文本限定符(不确定 BULK 命令,因为我一直只使用导入向导)。想象一下,当我们迁移到 2005 年时,我不得不重新设计所有流程,不导入平面文件,而是导入 XLS,我会感到多么惊讶。我只花了 16 个小时(是的,两个工作日)就得出了这个结论,那周我实际上失眠了,因为我对 MS 不允许使用文本限定符感到非常沮丧(我什至走进我老板的办公室道歉)花了这么多时间在本应 10 分钟的任务上)。具有讽刺意味的是,您无法告诉 Excel 导出任何不包含双引号文本限定符(或几乎任何其他与此相关的软件导出器)的内容。咕噜咕噜。
其中最令人沮丧的部分是 SQL 2005 导入向导有一个地方可以定义文本限定符!
...我敢说,在这次经历之后,我开始理解所有反 M$ 的言论了!
Sadly, SQL 2005 and 2008 import XLS files much more smoothly than CSV files. I've never been anti-Microsoft but unless all the ANSI standards of database management are dramatically changing and the concept of a text qualifier is being abandoned (which I highly doubt), then this is probably a proprietary move by MS. SQL 2000 handled text qualifiers just fine (not sure about the BULK command as I've always just used the Import Wizards). Imagine my surprise when we migrated to 2005 and I had to rework all of my processes to NOT import flat files but instead import XLS. It only took me 16 hours (yes, TWO work days) to come to that conclusion and I actually lost sleep that week because I was so frustrated with MS for not allowing the use of Text Qualifiers (I even went into my bosses office to apologize for spending so much time on what should have been a 10 minute task). Ironically, you can't tell Excel to export anything withOUT including a double-quoted text-qualifier (or virtually any other software exporters for that matter). GRRRRRR.
The most frustrating part of all of this is that the SQL 2005 import wizard has a place to define the text qualifer!
...dare I say I'm starting to understand all the anti-M$ rhetoric after this experience!
我知道这是一个老问题,但我有一个 TSQL 方法来处理间歇性引号分隔符。它可能不太漂亮,但它可能会对在这里找到方法的人有所帮助:
*&*
更新初始表
设置批量列 =
替换(批量列,
子字符串(批量列,
CHARINDEX('"', BulkColumn, 0),
CHARINDEX('"', BulkColumn, CHARINDEX('"', BulkColumn, 0) + 1) - CHARINDEX('"', BulkColumn, 0) + 1),
代替(
子字符串(批量列,
CHARINDEX('"', BulkColumn, 0),
CHARINDEX('"', BulkColumn, CHARINDEX('"', BulkColumn, 0) + 1) - CHARINDEX('"', BulkColumn, 0) + 1),
',',
'*&*'))
WHERE BulkColumn LIKE '%"%'
I know this is an old question, but I have a TSQL method for dealing with intermittent quote delimiters. It may not be pretty, but it may help someone who finds there way here:
*&*
UPDATE InitialTable
SET BulkColumn =
REPLACE(BulkColumn,
SubString(BulkColumn,
CHARINDEX('"', BulkColumn, 0),
CHARINDEX('"', BulkColumn, CHARINDEX('"', BulkColumn, 0) + 1) - CHARINDEX('"', BulkColumn, 0) + 1),
REPLACE(
SubString(BulkColumn,
CHARINDEX('"', BulkColumn, 0),
CHARINDEX('"', BulkColumn, CHARINDEX('"', BulkColumn, 0) + 1) - CHARINDEX('"', BulkColumn, 0) + 1),
',',
'*&*'))
WHERE BulkColumn LIKE '%"%'
对我有用的是改变
为:
What worked for me was changing
To: