自动合并任意 Django 模型

发布于 2024-12-14 08:07:13 字数 2401 浏览 1 评论 0原文

我有两个 Django-ORM 管理的数据库,我想合并它们。两者都有非常相似的架构,并且都有标准的 auth_users 表,以及一些相互引用的其他共享表以及 auth_users,我想将它们自动合并到单个数据库中。

可以理解的是,这可能非常重要,具体取决于外键关系以及每个表中“唯一”记录的构成。

有谁知道是否存在执行此合并操作的工具?

如果目前不存在这样的情况,我正在考虑基于标准 loaddata 命令编写自己的管理命令。本质上,您可以使用标准 dumpdata 命令从源数据库导出表,然后使用 loaddata 的修改版本将它们“合并”到目标数据库中。

例如,如果我有数据库 A 和 B,并且我想将数据库 B 合并到数据库 A 中,那么我想按照伪代码执行以下过程:

merge_database_dst = A
merge_database_src = B
for table in sorted(merge_database_dst.get_redundant_tables(merge_database_src), key=acyclic_dependency):
    key = table.get_unique_column_key()
    src_id_to_dst_id = {}
    for record_src in merge_database_src.table.objects.all():
        src_key_value = record_src.get_key_value(key)
        try:
            record_dst = merge_database_dst.table.objects.get(key)
            dst_key_value = record_dst.get_key_value(key)
        except merge_database_dst.table.DoesNotExist:
            record_dst = merge_database_dst.table(**[(k,convert_fk(v)) for k,v in record_src._meta.fields])
            record_dst.save()
            dst_key_value = record_dst.get_key_value(key)
        src_id_to_dst_id[(table,record_src.id)] = record_dst.id

convert_fk() 函数将使用 src_id_to_dst_id 索引进行转换源表中的外键引用目标表中的等效 ID。

总而言之,该算法将按照依赖顺序迭代要合并的表,首先迭代父级。因此,如果我们想要合并表 auth_users 和 mycustomprofile(依赖于 auth_users),我们将迭代 ['auth_users','mycustomprofile']。

每个合并表都需要某种指示符来记录表示普遍唯一记录(即“键”)的列组合。对于 auth_users,这可能是“用户名”和/或“电子邮件”列。

如果数据库B中的key的值在A中已经存在,则不会从B中导入该记录,而是记录A中已存在记录的ID。

如果B数据库中的key的值在A中不存在,则从B中导入该记录,并记录新记录的ID。

使用先前记录的 ID,创建一个映射,解释如何将对 B 中特定记录的外键引用映射到 A 中新合并/预先存在的记录。当将来的记录合并到 A 中时,将使用此映射转换外键。

我仍然可以设想某些情况,其中导入的记录引用未包含在转储数据中的表,这可能会导致整个导入失败,因此需要某种“dryrun”选项来模拟导入,以确保所有 FK 引用都可以翻译的。

这看起来是一个实用的方法吗?有更好的办法吗?

编辑:这并不完全是我正在寻找的,但我认为其他人可能会觉得它很有趣。 Turbion 项目有一个机制:在同一数据库中的不同 Django 模型中的等效记录之间复制更改。它的工作原理是在两个 Django 模型之间定义一个转换层(即 merging.ModelLayer),因此,假设您更新了用户 [电子邮件受保护] 的个人资料,它会自动更新用户[电子邮件受保护]的其他个人资料中的“url”字段。

我正在寻找的功能有点不同,因为我想以不频繁的时间间隔合并整个(或部分)数据库快照,类似于 loaddata 管理命令的方式。

I have two Django-ORM managed databases that I'd like to merge. Both have a very similar schema, and both have the standard auth_users table, along with a few other shared tables that reference each other as well as auth_users, which I'd like to merge into a single database automatically.

Understandably, this could be very non-trivial depending upon the foreign-key relationships, and what constitutes a "unique" record in each table.

Does anyone know if there exists a tool to do this merge operation?

If nothing like this currently exists, I was considering writing my own management command, based on the standard loaddata command. Essentially, you'd use the standard dumpdata command to export tables from a source database, and then use a modified version of loaddata to "merge" them into the destination database.

For example, if I have databases A and B, and I want to merge database B into database A, then I'd want to follow a procedure according to the pseudo-code:

merge_database_dst = A
merge_database_src = B
for table in sorted(merge_database_dst.get_redundant_tables(merge_database_src), key=acyclic_dependency):
    key = table.get_unique_column_key()
    src_id_to_dst_id = {}
    for record_src in merge_database_src.table.objects.all():
        src_key_value = record_src.get_key_value(key)
        try:
            record_dst = merge_database_dst.table.objects.get(key)
            dst_key_value = record_dst.get_key_value(key)
        except merge_database_dst.table.DoesNotExist:
            record_dst = merge_database_dst.table(**[(k,convert_fk(v)) for k,v in record_src._meta.fields])
            record_dst.save()
            dst_key_value = record_dst.get_key_value(key)
        src_id_to_dst_id[(table,record_src.id)] = record_dst.id

The convert_fk() function would use the src_id_to_dst_id index to convert foreign key references in the source table to the equivalent IDs in the destination table.

To summarize, the algorithm would iterate over the table to be merged in the order of dependency, with parents iterated over first. So if we wanted to merge tables auth_users and mycustomprofile, which is dependent on auth_users, we'd iterate ['auth_users','mycustomprofile'].

Each merged table would need some sort of indicator documenting the combination of columns that denotes a universally unique record (i.e. the "key"). For auth_users, that might be the "username" and/or "email" column.

If the value of the key in database B already exists in A, then the record is not imported from B, but the ID of the existing record in A is recorded.

If the value of the key in database B does not exist in A, then the record is imported from B, and the ID of the new record is recorded.

Using the previously recorded ID, a mapping is created, explaining how to map foreign-key references to that specific record in B to the new merged/pre-existing record in A. When future records are merged into A, this mapping would be used to convert the foreign keys.

I could still envision some cases where an imported record references a table not included in the dumpdata, which might cause the entire import to fail, therefore some sort of "dryrun" option would be needed to simulate the import to ensure all FK references can be translated.

Does this seem like a practical approach? Is there a better way?

EDIT: This isn't exactly what I'm looking for, but I thought others might find it interesting. The Turbion project has a mechanism for copying changes between equivalent records in different Django models within the same database. It works by defining a translation layer (i.e. merging.ModelLayer) between two Django models, so, say if you update the "www" field in user [email protected]'s profile, it'll automatically update the "url" field in user [email protected]'s otherprofile.

The functionality I'm looking for is a bit different, in that I want to merge an entire (or partial) database snapshot at infrequent intervals, sort of the way the loaddata management command does.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

冧九 2024-12-21 08:07:13

哇。无论如何,这将是一项复杂的工作。也就是说:

如果我正确理解您项目的需求,这可以通过使用 South< 中的数据迁移来完成/a>.即便如此,如果我说这是一个笑话,那我就是在撒谎。

我的建议是——这主要是你问题中的假设的鹦鹉学舌,但我想明确一点——你有一个“主”表作为基础,并且将另一个表中的记录添加到它。因此,表 A 保留其所有现有记录,并且仅从 B 获取添加内容。B 将添加内容提供给 A,一旦完成,B 就会被删除。

我很犹豫是否要为您编写示例代码,因为您的实际工作会比这复杂得多,但无论如何我都会尝试为您指出正确的方向。考虑这样的事情......

import datetime
from south.db import db
from south.v2 import DataMigration
from django.db import models

class Migration(DataMigration):
    def forwards(self, orm):
        for b in orm.B.objects.all():
            # sanity check: does this item get copied into A at all?
            if orm.A.objects.filter(username=b.username):
                continue

            # make an A record with the properties of my B record
            a = orm.A(
                first_name=b.first_name,
                last_name=b.last_name,
                email_address=b.email_address,
                [...]
            )

            # save the new A record, and delete the B record
            a.save()
            b.delete()

    def backwards(self, orm):
        # backwards method, if you write one

这最终会将所有不在 A 中的 B 迁移到 A,并留下一个预期重复的 B 表,然后您可以在删除之前通过其他方式检查该表。

正如我所说,该示例并不完整。如果您决定走这条路,请花时间阅读 South 文档,特别是确保您查看数据迁移

那是我的2美分。希望有帮助。

Wow. This is going to be a complex job regardless. That said:

If I understand the needs of your project correctly, this can be something that can be done using a data migration in South. Even so, I'd be lying if I said it was going to be a joke.

My recommendation is -- and this is mostly a parrot of an assumption in your question, but I want to make it clear -- that you have one "master" table that is the base, and which has records from the other table added to it. So, table A keeps all of its existing records, and only gets additions from B. B feeds additions into A, and once done, B is deleted.

I'm hesitant to write you sample code because your actual job will be so much more complex than this, but I will anyway to try and point you in the right direction. Consider something like...

import datetime
from south.db import db
from south.v2 import DataMigration
from django.db import models

class Migration(DataMigration):
    def forwards(self, orm):
        for b in orm.B.objects.all():
            # sanity check: does this item get copied into A at all?
            if orm.A.objects.filter(username=b.username):
                continue

            # make an A record with the properties of my B record
            a = orm.A(
                first_name=b.first_name,
                last_name=b.last_name,
                email_address=b.email_address,
                [...]
            )

            # save the new A record, and delete the B record
            a.save()
            b.delete()

    def backwards(self, orm):
        # backwards method, if you write one

This would end up migrating all of the Bs not in A to A, and leave you a table of Bs that are expected duplicates, which you could then check by some other means before deleting.

Like I said, this sample isn't meant to be complete. If you decide to go this route, spend time in the South documentation, and particularly make sure you look at data migrations.

That's my 2¢. Hope it helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文