使用 SQL 将文本文件导入通用数据库
我目前正在尝试使用 OleDb 将分号分隔的文本文件导入到 C# 中的数据库中,但我不知道类型(SQL Server、Access、Oracle、MySQL、postgreSQL 等)。使用 Jet 文本阅读器将文件作为数据库,然后创建准备好的插入语句,填充字段,然后在最后提交。 虽然这有效,但速度很慢,而且对于数百万行来说,它花费的时间太长。
所以我的问题是:是否有人对如何最好地将文本文件导入通用数据库有任何其他想法,或者对我的方法发表评论以加快导入速度?
我无法使用第 3 方库或软件来执行此操作,因为它是较大项目的一部分
I am currently trying to import a semi-colon delimited text file into a database in c# using OleDb where I don't know the type (SQL Server, Access, Oracle, MySQL, postgreSQL, etc.) Currently I'm reading in the file as a database using the Jet text reader then creating a prepared insert statement, populating the fields, then commiting at the end. While that works, it's slow and for millions of rows, it takes way too long.
So my question: Does anybody have any other thoughts on how to best import a text file to a generic database, or comments on my approaches that will lead to a faster import?
I cannot use 3rd party libraries or software to do this as it is part of a larger project
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
试试这个
http://filehelpers.sourceforge.net
....为什么你想将数据库加载到数据集? 让另一个数据库跟踪唯一性(如果有这样一个词)。 导入时,检查日志数据库中是否存在,如果不存在,则加载到通用数据库中。
等待对此主题的其他回应,我们可能会得到更好的想法。
Try this
http://filehelpers.sourceforge.net
....why would you want to load the db into the dataset? Have another database keep track of the uniqueness (if there is such a word). While importing, check if exists in the logging database, if no, then load to Generic Database.
Wait for some other responses to this thread, we may get a better idea.
不完全优雅,但性能可能更好:
您仍然单独插入每一行时,您不会创建那么多的网络流量。
详细地说,原始方法在客户端上生成语句,然后在客户端上执行它们,从而导致每行的网络流量。我的建议是在服务器上生成语句(在存储过程中)并让它们在服务器上执行,从而不会产生新的网络流量。
“正确”的解决方案是使用特定于数据库的导入工具(例如 SQL) 。 Oracle 的加载器)。性能提升是巨大的(我们在大约 5 分钟内加载了包含 2000 万行的巨大表)。
Not exactly elegant, but performance may be better:
While you are still inserting each line seperately, you wouldn't create quite as much network traffic.
To elaborate, the original method generates the statements on the client and then executes them on the client, resulting in network traffic for each line. My suggestion would be to generate the statements on the server (in a stored procedure) and have them execute on the server, resulting in no new network traffic.
The "correct" solution would be to use a database specific import tool (like SQL Loader for Oracle). The performance gains are enormous. (We are loading huge tables with 20 million lines in about 5 minutes). But of course, that is not very generic.
好吧,我设法将文本文件的行放入数据库数据集中,到目前为止,这种方法似乎更快。 我使用
当然现在它只是让 DataAdapter.Update(Dataset) 工作。 在线查看会很有趣...
Update
此方法不会产生更快的结果,因为 DataAdapter.Update 命令确实会逐行插入。
Well, I managed to get the rows of the text file into the database dataset, and so far this method seems to be quicker. I used
Of course now it's just getting the DataAdapter.Update(Dataset) to work. Looking online that is gonna be fun...
Update
This method does not yield quicker results as the DataAdapter.Update command does do line by line insertions.
批量插入 dbo.ImportTest
来自“C:\ImportData.txt”
AND ( 字段终止符 =',', FIRSTROW = 2 )
BULK INSERT dbo.ImportTest
FROM 'C:\ImportData.txt'
WITH ( FIELDTERMINATOR =',', FIRSTROW = 2 )
最好的选择是购买现成的应用程序来执行此操作。
专业的现成应用程序使用本机驱动程序,并针对它们将遇到的每种类型的数据源进行微调。 这总是在幕后,所以你看不到他们是如何做的。 例如,bulkcopy 用于 SQL Server; Oracle 有一个数据泵。
自行开发的问题在于,您要么花钱微调您的应用程序,使其能够与您可能遇到的每种源类型配合使用,要么通过使用通用 ODBC/ADO/无论什么驱动程序都会对性能造成巨大影响。
归根结底,您最好要么将其从您的产品中剔除,要么只是处理您被迫采取的不可避免的缓慢方法。 在这种情况下,这意味着对所有内容都使用单个插入语句。
那么,你们有多少钱用于开发资源呢?
Your best bet is to buy an off the shelf application for doing this.
Professional Off The Shelf applications use native drivers and fine tune for each type of datasource they will hit against. This is always under the covers so you don't see how they go about it. For example, bulkcopy is used against SQL Server; Oracle has a Data Pump.
The problem with rolling your own is that you can either spend the money to fine tune your application to work with each of the source types you're likely to encounter OR you take a huge performance hit by using the generic ODBC / ADO / Whatever drivers.
At the end of the day, you are better off either leaving this out of your product or just dealing with the inevitable slow approach that you are forced to take. In this case that means using single insert statements for everything.
So, how much money do you have for development resources?