与 SQL BCP 作斗争(uniqidentifier、列上的默认值...)
编辑:我唯一悬而未决的问题是c)(文件上的真与假,数据库上的位,我既无法更改文件也无法更改数据库方案,有数百太字节我无法触及)。
系统接收具有某种格式的文件(实际上有数十万个)。事情是: a) 第一种类型是 uniqidentifier(稍后会详细介绍) b) 在数据库上,表的前 4 个值由数据库生成(它们与日期相关),这意味着在文件中找不到这 4 个值(所有其余值 - 并且按顺序 -,即使它是始终将其表示为文本或为空) c) 位值在文件上用 False/True 表示。
因此,1 的问题是在我收到的输入文本文件中,uniqidentifier 使用括号。当我尝试使用 bcp 命令工具生成具有 nul 选项格式的文件时,它会使其成为具有 37 个字符的 sqlchar(这对我来说没有意义,因为它要么是 36,要么是 38)。
行分隔符为“+++\r\n”,列分隔符为“©®©”。
我将如何生成格式文件?我已经被这个问题困扰了一段时间了,我以前从未使用过 bcp,而且我遇到的错误并没有说明太多(“BCP 数据文件中遇到意外的 EOF”)
我是否应该指定格式文件还是只是我想从我得到的文件中读取的文件?
谢谢!
注意:我无法提供 SQL 架构,因为它是为我工作的公司提供的。但它几乎是:smalldate、tinyinttinyinttinyint(这四个是由数据库生成的)、uniqidentifier、chars、chars、更多varchars、一些位、更多varchars、一些nvarchar。除数据库生成的值外,所有值都接受 null。
我当前的问题是跳过前 4 列。
http://msdn.microsoft.com/en- us/library/ms179250(v=SQL.105).aspx
我遵循了该指南,但不知何故它不起作用。这是更改(我只是硬更改列名称以保护项目的隐私,即使这听起来很愚蠢)
这是用 bcp 生成的(格式为 nul -c)-注意我将其作为链接,因为它是没那么短- http://pastebin.com/4UkpPp1n
第二个,应该做同样的事情,但忽略前 4 个列位于下一个粘贴箱中: http://pastebin.com/Lqj6XSbW
但它不起作用。错误是“Error = [Microsoft][SQL Native Client]为 bcp 操作提供的字段数小于服务器上的列数。”,这应该是所有这些的目的。
任何帮助将不胜感激。
EDIT: My only pending issue is c) (True and False on file, bit on database, I can't change neither the file nor the database scheme, there's hundreds of terabytes I can't touch).
The system receives a file (hundreds of thousands of them, actually) with a certain format. Things are:
a) First type is a uniqidentifier (more on this later)
b) On the database, the table's first 4 values are generated by the database (they are related to dates), meaning that those 4 values are not found on the files (all the rest are -and are in order-, even if it's always their representation as text or they are empty)
c) Bit values are represented with a False/True on the file.
So, the issue for 1 is that in the text file I receive as input, the uniqidentifier is using brackets. When I tried to generate the file with the format nul options using the bcp command tool, it would make it a sqlchar with 37 characters (which makes no sense to me, since it would either be 36 or 38).
Row separator is "+++\r\n", column separator is "©®©".
How would I go about generating the format files? I've been stuck with this for some time, I never used bcp before and errors I've got don't really tell much ("Unexpected EOF encountered in BCP data-file")
Am I supposed to specify all the columns in the format file or just the ones I desire to read from the files I get?
Thanks!
NOTE: I can't provide the SQL schema since it's for the company I work for. But it's pretty much: smalldate, tinyint tinyint tinyint (this four are generated by the db), uniqidentifier, chars, chars, more varchars, some bits, more varchars, some nvarchar. ALL values, except for those generated by the db, accept null.
My current problem is with the skipping the first 4 columns.
http://msdn.microsoft.com/en-us/library/ms179250(v=SQL.105).aspx
I followed that guide but somehow it's not working. Here's the changes (I'm just hard-changing column names to keep privacy of the project, even if it sounds stupid)
This is the one generated with bcp (with format nul -c) -note I put it as link 'cause it's not that short-
http://pastebin.com/4UkpPp1n
The second one, which is supposed to do the same but ignoring the first 4 columns is in the next pastebin:
http://pastebin.com/Lqj6XSbW
Yet it is not working. The error is "Error = [Microsoft][SQL Native Client]The number of fields provided for bcp operation is less than the number of columns on the server.", which was supposed to be the purpose of all that.
Any help will be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我将创建一个新表,其中 GUID 为
CHAR(38)
。将数据导入到此暂存表中,然后使用CAST(SUBSTRING(GUID, 2, 36) AS UNIQUEIDENTIFIER)
进行转换,以将暂存数据导入到永久表中。此方法也适用于奇数格式的日期、带有货币符号的数字或通常任何类型的格式不良的输入。BCP 格式文件有点棘手,但从根本上来说并不太复杂。如果该部分继续给您带来麻烦,一种选择是将整行导入为单个
VARCHAR(1000)
字段,然后在 SQL 中将其拆分 - 如果您熟悉 SQL 文本处理是。或者,如果您熟悉其他编程语言(例如 Perl 或 C#),则可以创建一个脚本来将输入预处理为更友好的形式(例如制表符分隔)。如果您不熟悉其他编程语言,我建议您选择一种并开始! SQL 是一种很棒的语言,但有时您需要不同的工具;它不适合文本处理。
如果您熟悉 C#,这里是我生成格式文件的代码。没有人会取笑我的 Whitestone 缩进 :P
由于某种原因,我当时没有为数据读取器使用
using
块。I'd create a new table with a
CHAR(38)
for the GUID. Import your data into this staging table, then translate it withCAST(SUBSTRING(GUID, 2, 36) AS UNIQUEIDENTIFIER)
to import the staging data into your permanent table. This approach also works well for dates in odd formats, numbers with currency symbols, or generally any kind of poorly-formatted input.BCP format files are a little touchy, but fundamentally aren't too complicated. If that part continues to give you trouble, one option is to import the whole row as a single
VARCHAR(1000)
field, then split it up within SQL - if you're comfortable with SQL text processing that is.Alternately, if you are familiar with some other programming language, like Perl or C#, you can create a script to pre-process your inputs into a more friendly form, like tab-delimited. If you're not familiar with some other programming language, I suggest you pick one and get started! SQL is a great language, but sometimes you need a different tool; it's not great for text processing.
If you're familiar with C#, here's my code to generate a format file. No one gets to make fun of my Whitestone indentation :P
There was some reason I didn't use a
using
block for the data reader at the time.BCP 似乎不可能将 True 和 False 理解为位值。最好使用 SSIS 或首先替换文本内容(创建视图或类似的东西不是一个好主意,它的开销更大)。
It seems as if it is not possible for BCP to understand True and False as bit values. It's better to either go with SSIS or first replace the contents of the text (not a good idea to create views or anything like that, it is more overhead).