批量插入、SQL Server 2000、unix 换行符
我正在尝试将 .csv 文件插入带有 unix 换行符的数据库中。 我正在运行的命令是:
BULK INSERT table_name
FROM 'C:\file.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
如果我将文件转换为 Windows 格式,则加载可以工作,但如果可以避免的话,我不想执行此额外步骤。 有任何想法吗?
I am trying to insert a .csv file into a database with unix linebreaks. The command I am running is:
BULK INSERT table_name
FROM 'C:\file.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
If I convert the file into Windows format the load works, but I don't want to do this extra step if it can be avoided. Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
归根结底就是这样。 Unix 使用 LF (ctrl-J),MS-DOS/Windows 使用 CR/LF (ctrl-M/Ctrl-J)。
当您在 Unix 上使用 '\n' 时,它会被转换为 LF 字符。 在 MS-DOS/Windows 上,它被转换为 CR/LF。 当您的导入在 Unix 格式的文件上运行时,它只看到一个 LF。 因此,首先通过 unix2dos 运行文件通常更容易。 但正如您在最初的问题中所说,您不想这样做(我认为您不能这样做有充分的理由)。
为什么不能这样做:
可能是因为在解析 SQL 代码时,它没有用 LF 字符替换 char(10)(因为它已经用单引号括起来了)。 或者它可能被解释为:
当您回显 @bulk_cmd 的内容时会发生什么?
It comes down to this. Unix uses LF (ctrl-J), MS-DOS/Windows uses CR/LF (ctrl-M/Ctrl-J).
When you use '\n' on Unix, it gets translated to a LF character. On MS-DOS/Windows it gets translated to CR/LF. When the your import runs on the Unix formatted file, it sees only a LF. Hence, its often easier to run the file through unix2dos first. But as you said in you original question, you don't want to do this (I'll assume there is a good reason why you can't).
Why can't you do:
Probably because when the SQL code is being parsed, it is not replacing the char(10) with the LF character (because it's already encased in single-quotes). Or perhaps its being interpreted as:
What happens when you echo out the contents of @bulk_cmd?
我认为“ROWTERMINATOR = '\n'”会起作用。 我建议在显示“隐藏字符”的工具中打开文件,以确保该行像您想象的那样被终止。 我使用记事本++来做这样的事情。
I would think "ROWTERMINATOR = '\n'" would work. I would suggest opening the file in a tool that shows "hidden characters" to make sure the line is being terminated like you think. I use notepad++ for things like this.
在我看来,可以采取两种一般途径:一些替代方法在 SQL 脚本中读取 CSV 或使用多种方法中的任何一种预先转换 CSV(bcp、unix2dos,如果它是一个 -时间之王,您甚至可以使用代码编辑器来修复文件)。
但你必须采取额外的步骤!
如果此 SQL 是从程序启动的,您可能需要转换该程序中的行结尾。 在这种情况下,如果您决定自己编写转换代码,则需要注意以下事项:
1. 行结尾可能是\n
2. 或 \r\n
3.甚至\r(Mac!)
4.天哪,可能有些行有 \r\n 而其他行有 \n,任何组合都是可能的,除非你控制 CSV 的来源,好吧
,好吧。 可能性4是牵强的。 它发生在电子邮件中,但那是另一回事了。
Looks to me there are two general avenues that can be taken: some alternate way to read the CSV in the SQL script or convert the CSV beforehand with any of the numerous ways you can do that (bcp, unix2dos, if it is a one-time king of a thing, you can probably even use your code editor to fix the file for you).
But you will have to have an extra step!
If this SQL is launched from a program, you might want to convert the line endings in that program. In that case and you decide to code the conversion yourself, here is what you need to watch out for:
1. The line ending might be \n
2. or \r\n
3. or even \r (Mac!)
4. good grief, it could be that some lines have \r\n and others \n, any combination is possible unless you control where the CSV came from
OK, OK. Possibility 4 is farfetched. It happens in email, but that is another story.
一种选择是使用 bcp,并设置一个控制文件以
'\n'
作为换行符。尽管您已经表示不希望这样做,但另一种选择是使用 unix2dos 来将文件预处理为带有
'\r\n'
换行符的文件。最后,您可以在
BULK INSERT
上使用FORMATFILE
选项。 这将使用 bcp 控制文件来指定导入格式。One option would be to use bcp, and set up a control file with
'\n'
as the line break character.Although you've indicated that you would prefer not to, another option would be to use unix2dos to pre-process the file into one with
'\r\n'
line breaks.Finally, you can use the
FORMATFILE
option onBULK INSERT
. This will use a bcp control file to specify the import format.比这更复杂一点! 当您告诉 SQL Server ROWTERMINATOR='\n' 时,它会将其解释为 Windows 下的默认行终止符,实际上是“\r\n”(使用 C/C++ 表示法)。 如果您的行终止符实际上只是“\n”,您将必须使用上面显示的动态 SQL。 我刚刚花了一个小时的时间弄清楚为什么 \n 在与 BULK INSERT 一起使用时并不真正意味着 \n!
It's a bit more complicated than that! When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown above. I have just spent the best part of an hour figuring out why \n doesn't really mean \n when used with BULK INSERT!
我确认该语法
与 EXEC 命令一起使用时有效。
如果您有多个 ROWTERMINATOR 字符(例如管道和 unix 换行符),则语法为:
I confirm that the syntax
works when used with an EXEC command.
If you have multiple ROWTERMINATOR characters (e.g. a pipe and a unix linefeed) then the syntax for this is:
感谢所有回答的人,但我找到了我喜欢的解决方案。
当您告诉 SQL Server ROWTERMINATOR='\n' 时,它会将其解释为 Windows 下的默认行终止符,实际上是“\r\n”(使用 C/C++ 表示法)。 如果您的行终止符实际上只是“\n”,您将必须使用如下所示的动态 SQL。
为什么你不能说 BULK INSERT ...(ROWTERMINATOR = CHAR(10)) 超出了我的范围。 您似乎无法计算命令的WITH 部分中的任何表达式。
上面的作用是创建一个命令字符串并执行它。 巧妙地避免了创建额外文件或执行额外步骤的需要。
Thanks to all who have answered but I found my preferred solution.
When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown below.
Why you can't say BULK INSERT ...(ROWTERMINATOR = CHAR(10)) is beyond me. It doesn't look like you can evaluate any expressions in the WITH section of the command.
What the above does is create a string of the command and execute that. Neatly sidestepping the need to create an additional file or go through extra steps.
我觉得有必要做出贡献,因为我遇到了同样的问题,而且我每天至少需要从 SAP 读取 2 个 UNIX 文件几次。 因此,我需要的不是使用 unix2dos,而是需要更少的手动干预和通过编程实现更多自动化。
如前所述,Char(10) 在 sql 字符串中工作。 我不想使用 sql 字符串,所以我使用了 ''''+Char(10)+'''',但由于某种原因,这没有编译。
工作得非常顺利的是: with (ROWTERMINATOR = '0x0a')
用十六进制解决了问题!
I felt compelled to contribute as I was having the same issue, and I need to read 2 UNIX files from SAP at least a couple of times a day. Therefore, instead of using unix2dos, I needed something with less manual intervention and more automatic via programming.
As noted, the Char(10) works within the sql string. I didn't want to use an sql string, and so I used ''''+Char(10)+'''', but for some reason, this didn't compile.
What did work very slick was: with (ROWTERMINATOR = '0x0a')
Problem solved with Hex!