如何使用 Python 迭代 MySQL 表？

发布于 2024-09-08 02:02:08 字数 993 浏览 3 评论 0原文

我有一个 Python 脚本，它使用 MySQLdb 接口将各种 CSV 文件加载到 MySQL 表中。

在我的代码中，我使用 Python 的标准 CSV 库来读取 CSV，然后使用 INSERT 查询将每个字段一次插入到表中。我这样做而不是使用 LOAD DATA ，这样我就可以在每个字段的基础上转换空值和其他小的清理。

示例表格格式：

`id_number` | `iteration` | `date`     | `value`
102         | 1           | 2010-01-01 | 63
102         | 2           | 2010-01-02 | NULL
102         | 3           | 2010-01-03 | 65

id_number = 102 第二次迭代中的 null 值表示 value 与前一天相比没有更改的情况，即 value > 仍然是 63。

基本上，我需要将这些空值转换为其正确值。我可以想象 4 种方法来做到这一点：

将所有内容插入表中后，运行一个 MySQL 查询，该查询自行迭代和替换所有内容。
将所有内容插入表后，运行 MySQL 查询将一些数据发送回 Python，在 Python 中进行处理，然后运行 MySQL 查询以更新正确的值。
在每次插入之前，在 Python 中按字段进行处理。
插入临时表并使用SQL插入主表。

我可能可以弄清楚如何执行#2，也许还可以执行#3，但不知道如何执行#1 或#4，我认为这是最好的方法，因为它不需要对 Python 代码进行根本性的更改。

我的问题是 A）上述哪种方法是“最好的”和“最干净的”？（速度并不是真正的问题。）B）我如何实现#1 或#4？

提前致谢：）

原文

I have a Python script which uses the MySQLdb interface to load various CSV files into MySQL tables.

In my code, I use Python's standard CSV library to read the CSV, then I insert each field into the table one at a time, using an INSERT query. I do this rather than using LOAD DATA so that I can convert null values and other minor clean-ups on a per-field basis.

Example table format:

`id_number` | `iteration` | `date`     | `value`
102         | 1           | 2010-01-01 | 63
102         | 2           | 2010-01-02 | NULL
102         | 3           | 2010-01-03 | 65

The null value in the second iteration of id_number = 102 represents a case where value hasn't changed from the previous day i.e. value remains 63.

Basically, I need to convert these null values to their correct values. I can imagine 4 ways of doing this:

Once everything is inserted into the table, run a MySQL query that does the iterating and replacing all by itself.
Once everything is inserted into the table, run a MySQL query to send some data back to Python, process in Python then run a MySQL query to update the correct values.
Do the processing in Python on a per-field basis before each insert.
Insert into a temporary table and use SQL to insert into the main table.

I could probably work out how to do #2, and maybe #3, but have no idea how to do #1 or #4, which I think are the best methods as it then requires no fundamental changes to the Python code.

My question is A) which of the above methods is "best" and "cleanest"? (Speed not really an issue.) and B) how would I achieve #1 or #4?

Thanks in advance :)

分享到QQ

分享到微博