使用MySQL超过500K记录来处理Django迁移数据的最佳方法是什么

发布于 2025-01-21 04:12:43 字数 1662 浏览 1 评论 0原文

  • 一个迁移处理创建两个新字段action_duplatestatus_duplate
  • 第二个迁移复制了action> actionstatus status 字段的数据,在两个新创建的字段中,
def remove_foreign_keys_from_user_request(apps, schema_editor):
    UserRequests = apps.get_model("users", "UserRequest")

    for request_initiated in UserRequest.objects.all().select_related("action", "status"):
        request_initiated.action_duplicate = request_initiated.action.name
        request_initiated.status_duplicate = request_initiated.status.name
        request_initiated.save()
  • 第三个迁移假定要删除/删除旧字段action状态
  • 第四迁移应将新的重复字段重命名为旧删除字段 这里的解决方案是删除对状态和操作的依赖性,以避免不必要的数据库查询,因为该状态特别是未决 and 完成

我的问题是第二迁移。记录的数量在300k至600k的记录之间,因此我需要知道一种更有效的方法来执行此操作,以免占用所有可用的内存。 注意:数据库是mysql

userRequest模型的修剪版本

class UserRequest(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    reference = models.CharField(max_length=50, null=True, blank=True)
    requester = models.ForeignKey(User, blank=True, null=True, on_delete=models.CASCADE)
    action = models.ForeignKey(Action, on_delete=models.CASCADE)
    action_duplicate = models.CharField(
        max_length=50, choices=((ACTION_A, ACTION_A), (ACTION_B, ACTION_B)), default=ACTION_A
    )
    status = models.ForeignKey(ProcessingStatus, on_delete=models.CASCADE)
    status_duplicate = models.CharField(
        max_length=50,
        choices=((PENDING, PENDING), (PROCESSED, PROCESSED)),
        default=PENDING,
    )
  • A migration handles creating two new fields action_duplicate and status_duplicate
  • The second migration copies the data from the action and status fields, to the two newly created fields
def remove_foreign_keys_from_user_request(apps, schema_editor):
    UserRequests = apps.get_model("users", "UserRequest")

    for request_initiated in UserRequest.objects.all().select_related("action", "status"):
        request_initiated.action_duplicate = request_initiated.action.name
        request_initiated.status_duplicate = request_initiated.status.name
        request_initiated.save()
  • The third migration is suppose to remove/delete the old fields action and status
  • The fourth migration should rename the new duplicate fields to the old deleted fields
    The solution here is to remove the dependency on the status and action, to avoid unnecessary data base query, since the status especially will only be pending and completed

My question is for the second migration. The number of records are between 300k to 600k records so I need to know a more efficient way to do this so it doesn't take up all the memory available.
Note: The Database is MySQL.

A trimmed-down version of the UserRequest model

class UserRequest(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    reference = models.CharField(max_length=50, null=True, blank=True)
    requester = models.ForeignKey(User, blank=True, null=True, on_delete=models.CASCADE)
    action = models.ForeignKey(Action, on_delete=models.CASCADE)
    action_duplicate = models.CharField(
        max_length=50, choices=((ACTION_A, ACTION_A), (ACTION_B, ACTION_B)), default=ACTION_A
    )
    status = models.ForeignKey(ProcessingStatus, on_delete=models.CASCADE)
    status_duplicate = models.CharField(
        max_length=50,
        choices=((PENDING, PENDING), (PROCESSED, PROCESSED)),
        default=PENDING,
    )

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

慢慢从新开始 2025-01-28 04:12:43

您可以使用a /code> expression  [django-doc] ,然后进行批量更新:

def remove_foreign_keys_from_user_request(apps, schema_editor):
    UserRequests = apps.get_model('users', 'UserRequests')
    Action = apps.get_user('users', 'Action')
    Status = apps.get_user('users', 'ProcessingStatus')
    UserRequests.objects.update(
        action_duplicate=Subquery(
            Action.objects.filter(
                pk=OuterRef('action_id')
            ).values('name')[:1]
        ),
        status_duplicate=Subquery(
            Status.objects.filter(
                pk=OuterRef('status_id')
            ).values('name')[:1]
        )
    )

话虽如此,看来您正在做的是实际上是 :通常,如果有重复的数据,则您可以制作一个额外的模型,在其中制作一个action/status每个值,从而防止使其具有相同的action_duplate的值/status_duplate在数据库中多次:这将使数据库更大,并且很难维护。


注意:通常给出django模型a singular 名称,因此userRequest 而不是 userRequests


You can work with a Subquery expression [Django-doc], and do the update in bulk:

def remove_foreign_keys_from_user_request(apps, schema_editor):
    UserRequests = apps.get_model('users', 'UserRequests')
    Action = apps.get_user('users', 'Action')
    Status = apps.get_user('users', 'ProcessingStatus')
    UserRequests.objects.update(
        action_duplicate=Subquery(
            Action.objects.filter(
                pk=OuterRef('action_id')
            ).values('name')[:1]
        ),
        status_duplicate=Subquery(
            Status.objects.filter(
                pk=OuterRef('status_id')
            ).values('name')[:1]
        )
    )

That being said, it looks that what you are doing is actually the opposite of database normalization [wiki]: usually if there is duplicated data, you make an extra model where you make one Action/Status per value, and thus prevent having the same value for action_duplicate/status_duplicate multiple times in the database: this will make the database larger, and harder to maintain.


Note: normally a Django model is given a singular name, so UserRequest instead of UserRequests.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文