PL/SQL rownum 更新
我正在开发一个包含几个表的数据库。它们是一个
districts table
PK district_id
student_data table
PK study_id
FK district_id
ga_data table
PK study_id
district_id
ga_data 表是我添加的数据。student_data 表和 ga_data 都有 130 万条记录。两个表之间的 Study_id 为 1 比 1,但 ga_data.district_id 为 NULL,需要更新。我在使用以下 PL/SQL 时遇到问题:
update ga_data
set district_id = (select district_id from student_data
where student_data.study_id = ga_data.study_id)
where ga_data.district_id is null and rownum < 100;
我需要增量地执行它,这就是我需要 rownum 的原因。但我使用得正确吗?运行查询多次后,它只更新了 130 万条记录中的大约 8,000 条记录(应该是大约 110 万条更新,因为学生数据中的一些 District_id 为空)。谢谢!
I am working on a database with a couple of tables. They are a
districts table
PK district_id
student_data table
PK study_id
FK district_id
ga_data table
PK study_id
district_id
The ga_data table is data that I am adding in. Both the student_data table and ga_data have 1.3 million records. The study_id's are 1 to 1 between the two tables, but the ga_data.district_id's are NULL and need to be updated. I am having trouble with the following PL/SQL:
update ga_data
set district_id = (select district_id from student_data
where student_data.study_id = ga_data.study_id)
where ga_data.district_id is null and rownum < 100;
I need to do it incremently so that's why I need rownum. But am I using it correctly? After running the query a bunch of times, it only updated about 8,000 records of the 1.3 million (should be about 1.1 million updates since some of the district_ids are null in student_data). Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
ROWNUM 只是在前 n 行之后截断查询。 STUDENT_DATA 中有一些行的 DISTRICT_ID 为 NULL。因此,在多次运行之后,您的查询很可能会陷入困境,返回相同的 100 条 QA_DATA 记录,所有这些记录都与那些讨厌的 STUDENT_DATA 行之一匹配。
因此,您需要某种机制来确保您按照自己的方式逐步完成 QA_DATA 表。标志列是一种解决方案。另一种方法是对查询进行分区,使其命中一组不同的 STUDENT_ID。
目前尚不清楚为什么必须以 100 个为批次执行此操作,但也许最简单的方法是使用 BULK PROCESSING(至少在 Oracle 中:此 PL/SQL 语法在 MySQL 中不起作用)。
以下是一些测试数据:
该匿名块使用 Bulk 处理 LIMIT 子句将结果集批处理为 100 行的块。
请注意,此构造允许我们在发布更新之前对所选行进行附加处理。如果我们需要以编程方式应用复杂的修复,这会很方便。
如您所见,QA_DATA 中的数据现在与 STUDENT_DATA 中的数据匹配
ROWNUM just chops off query after the first n rows. You have some rows in STUDENT_DATA which have a NULL for DISTRICT_ID. So after a number of runs your query is liable to get stuck in a rut, returning the same 100 QA_DATA records, all of which match one of those pesky STUDENT_DATA rows.
So you need some mechanism for ensuring that you are working your way progressively through the QA_DATA table. A flag column would be one solution. Partitioning the query so it hits a different set of STUDENT_IDs is another.
It's not clear why you have to do this in batches of 100, but perhaps the easiest way of doing this would be to use BULK PROCESSING (at least in Oracle: this PL/SQL syntax won't work in MySQL).
Here is some test data:
This anonymous block uses the Bulk processing LIMIT clause to batch the result set into chunks of 100 rows.
Note that this construct allows us to do additional processing on the selected rows before issuing the update. This is handy if we need to apply complicated fixes programmatically.
As you can see, the data in QA_DATA now matches that in STUDENT_DATA
一次只更新 100 行是一个奇怪的要求。这是为什么?
无论如何,由于student_data 中的district_id 可能为空,因此您可能会一遍又一遍地更新相同的100 行。
如果您扩展查询以确保存在非空的 District_id,您可能会到达您想要的位置:
It is kind of an odd requirement to only update 100 rows at a time. Why is that?
Anyway, since district_id in student_data can be null, you might be updating the same 100 rows over and over again.
If you extend your query to make sure a non-null district_id exists, you might end up where you want to be:
如果这是一次性转换,您应该考虑一种完全不同的方法。重新创建该表作为两个表的联接。我保证,当您意识到它与各种有趣的一次 100 行更新相比有多快时,您会大笑起来。
或者,如果它不是一次性转换,或者您无法重新创建/删除表,或者您只是想在数据加载上多花几个小时:
最后一条语句也可以重写为可更新视图,但我个人从不使用它们。
在运行合并之前删除/禁用 ga_data.district_id 上的索引/约束并在之后重新创建它们将提高性能。
If this is a one-time conversion you should consider a completely different approach. Recreate the table as the join of your two tables. I promise you will laugh out loud when you realise how fast it is compared to all kinds of funny 100-rows-at-a-time updates.
Or if it isn't a one time conversion or you can't re-create/drop tables or you just feel like spending a few extra hours on data loading:
The last statement can also be rewritten as an updatable-view, but I personally never use them.
Drop/disable indexes/constraints on
ga_data.district_id
before running the merge and recreate them afterward will improve on the performance.