使用MySQL超过500K记录来处理Django迁移数据的最佳方法是什么
- 一个迁移处理创建两个新字段
action_duplate
和status_duplate
- 第二个迁移复制了
action> action
和status
status 字段的数据,在两个新创建的字段中,
def remove_foreign_keys_from_user_request(apps, schema_editor):
UserRequests = apps.get_model("users", "UserRequest")
for request_initiated in UserRequest.objects.all().select_related("action", "status"):
request_initiated.action_duplicate = request_initiated.action.name
request_initiated.status_duplicate = request_initiated.status.name
request_initiated.save()
- 第三个迁移假定要删除/删除旧字段
action
和状态
- 第四迁移应将新的重复字段重命名为旧删除字段 这里的解决方案是删除对状态和操作的依赖性,以避免不必要的数据库查询,因为该状态特别是
未决
and 完成
我的问题是第二迁移。记录的数量在300k至600k的记录之间,因此我需要知道一种更有效的方法来执行此操作,以免占用所有可用的内存。 注意:数据库是mysql
。
userRequest
模型的修剪版本
class UserRequest(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
reference = models.CharField(max_length=50, null=True, blank=True)
requester = models.ForeignKey(User, blank=True, null=True, on_delete=models.CASCADE)
action = models.ForeignKey(Action, on_delete=models.CASCADE)
action_duplicate = models.CharField(
max_length=50, choices=((ACTION_A, ACTION_A), (ACTION_B, ACTION_B)), default=ACTION_A
)
status = models.ForeignKey(ProcessingStatus, on_delete=models.CASCADE)
status_duplicate = models.CharField(
max_length=50,
choices=((PENDING, PENDING), (PROCESSED, PROCESSED)),
default=PENDING,
)
- A migration handles creating two new fields
action_duplicate
andstatus_duplicate
- The second migration copies the data from the
action
andstatus
fields, to the two newly created fields
def remove_foreign_keys_from_user_request(apps, schema_editor):
UserRequests = apps.get_model("users", "UserRequest")
for request_initiated in UserRequest.objects.all().select_related("action", "status"):
request_initiated.action_duplicate = request_initiated.action.name
request_initiated.status_duplicate = request_initiated.status.name
request_initiated.save()
- The third migration is suppose to remove/delete the old fields
action
andstatus
- The fourth migration should rename the new duplicate fields to the old deleted fields
The solution here is to remove the dependency on the status and action, to avoid unnecessary data base query, since the status especially will only bepending
andcompleted
My question is for the second migration. The number of records are between 300k to 600k records so I need to know a more efficient way to do this so it doesn't take up all the memory available.
Note: The Database is MySQL
.
A trimmed-down version of the UserRequest
model
class UserRequest(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
reference = models.CharField(max_length=50, null=True, blank=True)
requester = models.ForeignKey(User, blank=True, null=True, on_delete=models.CASCADE)
action = models.ForeignKey(Action, on_delete=models.CASCADE)
action_duplicate = models.CharField(
max_length=50, choices=((ACTION_A, ACTION_A), (ACTION_B, ACTION_B)), default=ACTION_A
)
status = models.ForeignKey(ProcessingStatus, on_delete=models.CASCADE)
status_duplicate = models.CharField(
max_length=50,
choices=((PENDING, PENDING), (PROCESSED, PROCESSED)),
default=PENDING,
)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用a /code> expression  [django-doc] ,然后进行批量更新:
话虽如此,看来您正在做的是实际上是 :通常,如果有重复的数据,则您可以制作一个额外的模型,在其中制作一个
action
/status
每个值,从而防止使其具有相同的action_duplate的值
/status_duplate
在数据库中多次:这将使数据库更大,并且很难维护。You can work with a
Subquery
expression [Django-doc], and do the update in bulk:That being said, it looks that what you are doing is actually the opposite of database normalization [wiki]: usually if there is duplicated data, you make an extra model where you make one
Action
/Status
per value, and thus prevent having the same value foraction_duplicate
/status_duplicate
multiple times in the database: this will make the database larger, and harder to maintain.