使用 Python 将文本文件导入 SQL Server

发布于 2024-08-02 07:59:42 字数 367 浏览 4 评论 0原文

我正在编写一个 python 脚本,它将对文本文件进行一些处理。作为该过程的一部分,我需要将制表符分隔文件的每一行导入到本地 MS SQL Server (2008) 表中。我正在使用 pyodbc,我知道如何做到这一点。但是,我对执行它的最佳方法有疑问。

我将循环遍历该文件,为文件的每一行创建一个cursor.execute(myInsertSQL)。有没有人看到等待提交语句直到所有记录都被循环为止的任何问题(即在循环之后执行 commit() 而不是在每个单独执行后在循环内执行)?我问的原因是有些文件的行数会超过 5000 行。我不知道尝试“保存它们”并一次提交所有 5000 是否会导致问题。

我对 python 还很陌生,所以我还不知道所有这些问题。

谢谢。

I am writing a python script that will be doing some processing on text files. As part of that process, i need to import each line of the tab-separated file into a local MS SQL Server (2008) table. I am using pyodbc and I know how to do this. However, I have a question about the best way to execute it.

I will be looping through the file, creating a cursor.execute(myInsertSQL) for each line of the file. Does anyone see any problems waiting to commit the statements until all records have been looped (i.e. doing the commit() after the loop and not inside the loop after each individual execute)? The reason I ask is that some files will have upwards of 5000 lines. I didn't know if trying to "save them up" and committing all 5000 at once would cause problems.

I am fairly new to python, so I don't know all of these issues yet.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

執念 2024-08-09 07:59:42

如果我理解你在做什么,Python就不会成为问题。在 Python 中执行事务内的语句不会创建累积状态。它只会在数据库服务器本身执行此操作。

当您提交时,您需要确保提交发生,因为大批量提交可能会与数据库中的干预更改发生冲突。如果提交失败,您将必须重新运行批处理。

这是我知道的大批量和 Python/ODBC 的唯一问题(它甚至不是真正的 Python 问题,因为无论如何你都会遇到这个问题。)

现在,如果你在内存中创建所有 SQL,然后循环通过记忆表征,这可能更有意义。不过,在现代机器上 5000 行文本确实没什么大不了的。如果您开始需要处理两个数量级以上的数据,您可能需要重新考虑您的流程。

If I understand what you are doing, Python is not going to be a problem. Executing a statement inside a transaction does not create cumulative state in Python. It will do so only at the database server itself.

When you commit you will need to make sure the commit occurred, since having a large batch commit may conflict with intervening changes in the database. If the commit fails, you will have to re-run the batch again.

That's the only problem that I am aware of with large batches and Python/ODBC (and it's not even really a Python problem, since you would have that problem regardless.)

Now, if you were creating all the SQL in memory, and then looping through the memory-representation, that might make more sense. Still, 5000 lines of text on a modern machine is really not that big of a deal. If you start needing to process two orders of magnitude more, you might need to rethink your process.

坏尐絯℡ 2024-08-09 07:59:42

创建一个文件并使用BULK INSERT。会更快。

Create a file and use BULK INSERT. It will be faster.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文