批量导入文本文件/VB2005/SQL Server 2005
我继承了一个 .NET 应用程序来支持/增强它读取数十万行和数百万行之一的几个文件。
最初的开发人员给我留下了类似 :- 的代码
For Each ModelListRow As String In ModelListDataArray
If ModelListRow.Trim.Length = 0 Or ModelListRow.Contains(",") = False Then
GoTo SKIP_ROW
End If
Dim ModelInfo = ModelListRow.Split(",")
Dim ModelLocation As String = UCase(ModelInfo(0))
Dim ModelCustomer As String = UCase(ModelInfo(1))
Dim ModelNumber As String = UCase(ModelInfo(2))
If ModelLocation = "LOCATION" Or ModelNumber = "MODEL" Then
GoTo SKIP_ROW
End If
Dim MyDataRow As DataRow = dsModels.Tables(0).NewRow
MyDataRow.Item("location") = ModelLocation.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
MyDataRow.Item("model") = ModelNumber.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
dsModels.Tables(0).Rows.Add(MyDataRow)
SKIP_ROW:
Next
,导入这些文件需要很长时间(嗯,将近半小时)。
我怀疑有更好的方法可以做到这一点。我正在寻找建议。
提前致谢。
I've inherited a .NET app to support / enhance which reads in a couple of files of high hundreds of thousands of rows, and one of millions of row.
The original developer left me code like :-
For Each ModelListRow As String In ModelListDataArray
If ModelListRow.Trim.Length = 0 Or ModelListRow.Contains(",") = False Then
GoTo SKIP_ROW
End If
Dim ModelInfo = ModelListRow.Split(",")
Dim ModelLocation As String = UCase(ModelInfo(0))
Dim ModelCustomer As String = UCase(ModelInfo(1))
Dim ModelNumber As String = UCase(ModelInfo(2))
If ModelLocation = "LOCATION" Or ModelNumber = "MODEL" Then
GoTo SKIP_ROW
End If
Dim MyDataRow As DataRow = dsModels.Tables(0).NewRow
MyDataRow.Item("location") = ModelLocation.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
MyDataRow.Item("model") = ModelNumber.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
dsModels.Tables(0).Rows.Add(MyDataRow)
SKIP_ROW:
Next
and it takes an age (well, nearly half an hour) to import these files.
I suspect there's a MUCH better way to do it. I'm looking for suggestions.
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
看一下批量插入。
http://msdn.microsoft.com/en- us/library/ms188365(v=SQL.90).aspx
基本上,您将 SQL Server 指向 CSV 格式的文本文件,它会执行将数据拉入表中的所有逻辑。如果您需要对其进行更多处理,可以将文本文件拉入 SQL Server 中的暂存位置,然后运行存储过程将其处理为您需要的格式。
Take a look at BULK INSERT.
http://msdn.microsoft.com/en-us/library/ms188365(v=SQL.90).aspx
Basically you point SQL Server at a text file in CSV format and it does all the logic of pulling the data into a table. If you need to massage it more than that, you can pull the text file into a staging location in SQL Server, and then run a stored proc to massage it into the format you are looking for.
主要选项(除了从头开始编写自己的代码之外)是:
jkohlhepp 关于将数据拉入暂存表然后清理它的建议是一个很好的建议,也是 ETL 过程中非常常见的模式。但是,如果您的“按摩”在 TSQL 中不容易完成,那么您可能仍然需要一些 .NET 代码,无论是在 SSIS 中还是在 CLR 过程中。
就我个人而言,我会在您的情况下使用 SSIS,因为数据看起来格式不清晰,因此您可能需要一些自定义代码来清理/重新格式化数据到数据库的过程。然而,这确实取决于您最舒服/最高效的方式以及您现有的工具和标准。
The main options (apart from writing your own code from scratch) are:
jkohlhepp's suggestion about pulling data into a staging table then cleaning it is a good one and a very common pattern in ETL processes. But if your "massaging" isn't easy to do in TSQL then you will probably need some .NET code anyway, whether it's in SSIS or in a CLR procedure.
Personally I would use SSIS in your case, because it looks like the data is not cleanly formatted so you will probably need some custom code to clean/re-format the data on its way to the database. However it does depend on what you're most comfortable/productive with and what existing tools and standards you have in place.