使用 SQL 将文本文件导入通用数据库

发布于 2024-07-06 23:38:34 字数 294 浏览 4 评论 0原文

我目前正在尝试使用 OleDb 将分号分隔的文本文件导入到 C# 中的数据库中,但我不知道类型(SQL Server、Access、Oracle、MySQL、postgreSQL 等)。使用 Jet 文本阅读器将文件作为数据库,然后创建准备好的插入语句,填充字段,然后在最后提交。 虽然这有效,但速度很慢,而且对于数百万行来说,它花费的时间太长。

所以我的问题是:是否有人对如何最好地将文本文件导入通用数据库有任何其他想法,或者对我的方法发表评论以加快导入速度?

我无法使用第 3 方库或软件来执行此操作,因为它是较大项目的一部分

I am currently trying to import a semi-colon delimited text file into a database in c# using OleDb where I don't know the type (SQL Server, Access, Oracle, MySQL, postgreSQL, etc.) Currently I'm reading in the file as a database using the Jet text reader then creating a prepared insert statement, populating the fields, then commiting at the end. While that works, it's slow and for millions of rows, it takes way too long.

So my question: Does anybody have any other thoughts on how to best import a text file to a generic database, or comments on my approaches that will lead to a faster import?

I cannot use 3rd party libraries or software to do this as it is part of a larger project

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

可爱暴击 2024-07-13 23:38:34

试试这个

http://filehelpers.sourceforge.net

....为什么你想将数据库加载到数据集? 让另一个数据库跟踪唯一性(如果有这样一个词)。 导入时,检查日志数据库中是否存在,如果不存在,则加载到通用数据库中。

等待对此主题的其他回应,我们可能会得到更好的想法。

Try this

http://filehelpers.sourceforge.net

....why would you want to load the db into the dataset? Have another database keep track of the uniqueness (if there is such a word). While importing, check if exists in the logging database, if no, then load to Generic Database.

Wait for some other responses to this thread, we may get a better idea.

流心雨 2024-07-13 23:38:34

不完全优雅,但性能可能更好:

  • 将整个文件加载到只有一列“行”作为长文本的表中(类似于您现在在本地
  • 使用存储过程将字段分开并创建插入
  • 执行插入在服务器上当

您仍然单独插入每一行时,您不会创建那么多的网络流量。

详细地说,原始方法在客户端上生成语句,然后在客户端上执行它们,从而导致每行的网络流量。我的建议是在服务器上生成语句(在存储过程中)并让它们在服务器上执行,从而不会产生新的网络流量。

“正确”的解决方案是使用特定于数据库的导入工具(例如 SQL) 。 Oracle 的加载器)。性能提升是巨大的(我们在大约 5 分钟内加载了包含 2000 万行的巨大表)。

Not exactly elegant, but performance may be better:

  • load the whole file into a table with just one column "Line" as long text (similar to what you do now locally
  • use stored procedures to split the fields apart and create the inserts
  • execute the inserts on the server

While you are still inserting each line seperately, you wouldn't create quite as much network traffic.

To elaborate, the original method generates the statements on the client and then executes them on the client, resulting in network traffic for each line. My suggestion would be to generate the statements on the server (in a stored procedure) and have them execute on the server, resulting in no new network traffic.

The "correct" solution would be to use a database specific import tool (like SQL Loader for Oracle). The performance gains are enormous. (We are loading huge tables with 20 million lines in about 5 minutes). But of course, that is not very generic.

扮仙女 2024-07-13 23:38:34

好吧,我设法将文本文件的行放入数据库数据集中,到目前为止,这种方法似乎更快。 我使用

Dataset.Tables[x].ImportRow(DataRow)

当然现在它只是让 DataAdapter.Update(Dataset) 工作。 在线查看会很有趣...

Update

此方法不会产生更快的结果,因为 DataAdapter.Update 命令确实会逐行插入。

Well, I managed to get the rows of the text file into the database dataset, and so far this method seems to be quicker. I used

Dataset.Tables[x].ImportRow(DataRow)

Of course now it's just getting the DataAdapter.Update(Dataset) to work. Looking online that is gonna be fun...

Update

This method does not yield quicker results as the DataAdapter.Update command does do line by line insertions.

疏忽 2024-07-13 23:38:34

批量插入 dbo.ImportTest
来自“C:\ImportData.txt”
AND ( 字段终止符 =',', FIRSTROW = 2 )

BULK INSERT dbo.ImportTest
FROM 'C:\ImportData.txt'
WITH ( FIELDTERMINATOR =',', FIRSTROW = 2 )

剩余の解释 2024-07-13 23:38:34

最好的选择是购买现成的应用程序来执行此操作。

专业的现成应用程序使用本机驱动程序,并针对它们将遇到的每种类型的数据源进行微调。 这总是在幕后,所以你看不到他们是如何做的。 例如,bulkcopy 用于 SQL Server; Oracle 有一个数据泵。

自行开发的问题在于,您要么花钱微调您的应用程序,使其能够与您可能遇到的每种源类型配合使用,要么通过使用通用 ODBC/ADO/无论什么驱动程序都会对性能造成巨大影响。

归根结底,您最好要么将其从您的产品中剔除,要么只是处理您被迫采取的不可避免的缓慢方法。 在这种情况下,这意味着对所有内容都使用单个插入语句。

那么,你们有多少钱用于开发资源呢?

Your best bet is to buy an off the shelf application for doing this.

Professional Off The Shelf applications use native drivers and fine tune for each type of datasource they will hit against. This is always under the covers so you don't see how they go about it. For example, bulkcopy is used against SQL Server; Oracle has a Data Pump.

The problem with rolling your own is that you can either spend the money to fine tune your application to work with each of the source types you're likely to encounter OR you take a huge performance hit by using the generic ODBC / ADO / Whatever drivers.

At the end of the day, you are better off either leaving this out of your product or just dealing with the inevitable slow approach that you are forced to take. In this case that means using single insert statements for everything.

So, how much money do you have for development resources?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文