批量导入文本文件/VB2005/SQL Server 2005

发布于 2024-11-29 01:07:11 字数 1227 浏览 0 评论 0原文

我继承了一个 .NET 应用程序来支持/增强它读取数十万行和数百万行之一的几个文件。

最初的开发人员给我留下了类似 :- 的代码

                For Each ModelListRow As String In ModelListDataArray
                    If ModelListRow.Trim.Length = 0 Or ModelListRow.Contains(",") = False Then
                        GoTo SKIP_ROW
                    End If

                    Dim ModelInfo = ModelListRow.Split(",")
                    Dim ModelLocation As String = UCase(ModelInfo(0))
                    Dim ModelCustomer As String = UCase(ModelInfo(1))
                    Dim ModelNumber As String = UCase(ModelInfo(2))

                    If ModelLocation = "LOCATION" Or ModelNumber = "MODEL" Then
                        GoTo SKIP_ROW
                    End If

                    Dim MyDataRow As DataRow = dsModels.Tables(0).NewRow
                    MyDataRow.Item("location") = ModelLocation.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
                    MyDataRow.Item("model") = ModelNumber.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
                    dsModels.Tables(0).Rows.Add(MyDataRow)
SKIP_ROW:
                Next

,导入这些文件需要很长时间(嗯,将近半小时)。

我怀疑有更好的方法可以做到这一点。我正在寻找建议。

提前致谢。

I've inherited a .NET app to support / enhance which reads in a couple of files of high hundreds of thousands of rows, and one of millions of row.

The original developer left me code like :-

                For Each ModelListRow As String In ModelListDataArray
                    If ModelListRow.Trim.Length = 0 Or ModelListRow.Contains(",") = False Then
                        GoTo SKIP_ROW
                    End If

                    Dim ModelInfo = ModelListRow.Split(",")
                    Dim ModelLocation As String = UCase(ModelInfo(0))
                    Dim ModelCustomer As String = UCase(ModelInfo(1))
                    Dim ModelNumber As String = UCase(ModelInfo(2))

                    If ModelLocation = "LOCATION" Or ModelNumber = "MODEL" Then
                        GoTo SKIP_ROW
                    End If

                    Dim MyDataRow As DataRow = dsModels.Tables(0).NewRow
                    MyDataRow.Item("location") = ModelLocation.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
                    MyDataRow.Item("model") = ModelNumber.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
                    dsModels.Tables(0).Rows.Add(MyDataRow)
SKIP_ROW:
                Next

and it takes an age (well, nearly half an hour) to import these files.

I suspect there's a MUCH better way to do it. I'm looking for suggestions.

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

转身以后 2024-12-06 01:07:11

看一下批量插入。

http://msdn.microsoft.com/en- us/library/ms188365(v=SQL.90).aspx

基本上,您将 SQL Server 指向 CSV 格式的文本文件,它会执行将数据拉入表中的所有逻辑。如果您需要对其进行更多处理,可以将文本文件拉入 SQL Server 中的暂存位置,然后运行存储过程将其处理为您需要的格式。

Take a look at BULK INSERT.

http://msdn.microsoft.com/en-us/library/ms188365(v=SQL.90).aspx

Basically you point SQL Server at a text file in CSV format and it does all the logic of pulling the data into a table. If you need to massage it more than that, you can pull the text file into a staging location in SQL Server, and then run a stored proc to massage it into the format you are looking for.

久隐师 2024-12-06 01:07:11

主要选项(除了从头开始编写自己的代码之外)是:

  • BULK INSERT 或 bcp.exe,如果您的数据是干净格式的
  • SSIS,如果您需要工作流、数据类型转换、数据清理等,则
  • 它们可以很好地工作。 .NET SqlBulkCopy API

jkohlhepp 关于将数据拉入暂存表然后清理它的建议是一个很好的建议,也是 ETL 过程中非常常见的模式。但是,如果您的“按摩”在 TSQL 中不容易完成,那么您可能仍然需要一些 .NET 代码,无论是在 SSIS 中还是在 CLR 过程中。

就我个人而言,我会在您的情况下使用 SSIS,因为数据看起来格式不清晰,因此您可能需要一些自定义代码来清理/重新格式化数据到数据库的过程。然而,这确实取决于您最舒服/最高效的方式以及您现有的工具和标准。

The main options (apart from writing your own code from scratch) are:

  • BULK INSERT or bcp.exe, which work well if your data is cleanly formatted
  • SSIS, if you need workflow, data type transformations, data cleansing etc.
  • .NET SqlBulkCopy API

jkohlhepp's suggestion about pulling data into a staging table then cleaning it is a good one and a very common pattern in ETL processes. But if your "massaging" isn't easy to do in TSQL then you will probably need some .NET code anyway, whether it's in SSIS or in a CLR procedure.

Personally I would use SSIS in your case, because it looks like the data is not cleanly formatted so you will probably need some custom code to clean/re-format the data on its way to the database. However it does depend on what you're most comfortable/productive with and what existing tools and standards you have in place.

一身仙ぐ女味 2024-12-06 01:07:11
Dim ExcelConnection As New System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\MyExcelSpreadsheet.xlsx;Extended Properties=""Excel 12.0 Xml;HDR=Yes""")
ExcelConnection.Open()
Dim ExcelConnection As New System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\MyExcelSpreadsheet.xlsx;Extended Properties=""Excel 12.0 Xml;HDR=Yes""")
ExcelConnection.Open()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文