与MPTT,Django和CSV进口的批量进口

发布于 2025-01-22 18:55:08 字数 2396 浏览 3 评论 0原文

我有以下代码从非常大的CSV中获取数据,将其导入Django模型并将其转换为嵌套类别(MPTT模型)。

with open(path, "rt") as f:
        reader = csv.reader(f, dialect="excel")
        next(reader)
        for row in reader:
            if int(row[2]) == 0:

                obj = Category(
                    cat_id=int(row[0]),
                    cat_name=row[1],
                    cat_parent_id=int(row[2]),
                    cat_level=int(row[3]),
                    cat_link=row[4],
                )
                obj.save()
            else:
                parent_id = Category.objects.get(cat_id=int(row[2]))
                obj = Category(
                    cat_id=int(row[0]),
                    cat_name=row[1],
                    cat_parent_id=int(row[2]),
                    cat_level=int(row[3]),
                    cat_link=row[4],
                    parent=parent_id,
                )
                obj.save()

运行此进口需要一个多小时。是否有更有效的方法可以做到这一点?我尝试了一个Bulk_Create列表的理解,但是发现我需要

parent_id = Category.objects.get(cat_id=int(row[2]))

使MPTT正确嵌套。另外,我认为我必须保存每个插入以正确更新MPTT。我很确定Parent_ID查询使操作变得如此慢,因为它必须查询CSV中每一行的数据库(超过27k行)。

优势是此操作只需要偶尔进行,但是该数据集并不大,以至于运行一个小时。

编辑

类别模型

class Category(MPTTModel):
cat_id = models.IntegerField(null=False, name="cat_id")
cat_name = models.CharField(max_length=100,
                            null=False,
                            name="cat_name",
                            default="")
cat_parent_id = models.IntegerField(null=False, name="cat_parent_id")
cat_level = models.IntegerField(null=False, name="cat_level")
cat_link = models.URLField(null=True, name="cat_link")
parent = TreeForeignKey("self",
                        on_delete=models.CASCADE,
                        null=True,
                        blank=True,
                        related_name="children")

def __str__(self):
    return f"{self.cat_name}"

class MPTTMeta:
    order_insertion_by = ["cat_name"]

编辑2

cat_parent_id = int(row[2])
obj = Category(
                cat_id=int(row[0]),
                cat_name=row[1],
                cat_parent_id=int(row[2]),
                cat_level=int(row[3]),
                cat_link=row[4],
                parent={cat_parent_id}_id,
            )

I have the following code to take data from a very large csv, import it into a Django model and convert it into nested categories (mptt model).

with open(path, "rt") as f:
        reader = csv.reader(f, dialect="excel")
        next(reader)
        for row in reader:
            if int(row[2]) == 0:

                obj = Category(
                    cat_id=int(row[0]),
                    cat_name=row[1],
                    cat_parent_id=int(row[2]),
                    cat_level=int(row[3]),
                    cat_link=row[4],
                )
                obj.save()
            else:
                parent_id = Category.objects.get(cat_id=int(row[2]))
                obj = Category(
                    cat_id=int(row[0]),
                    cat_name=row[1],
                    cat_parent_id=int(row[2]),
                    cat_level=int(row[3]),
                    cat_link=row[4],
                    parent=parent_id,
                )
                obj.save()

It takes over an hour to run this import. Is there a more efficient way to do this? I tried a bulk_create list comprehension, but found I need

parent_id = Category.objects.get(cat_id=int(row[2]))

to get mptt to nest correctly. Also, I think I have to save each insertion for mptt to properly update. I'm pretty sure the parent_id query is making the operation so slow as it has to query the database for every row in the CSV (over 27k rows).

The upside is this operation will only need to be done sporadically, but this dataset isn't so large that it should take over an hour to run.

Edit

Category Model

class Category(MPTTModel):
cat_id = models.IntegerField(null=False, name="cat_id")
cat_name = models.CharField(max_length=100,
                            null=False,
                            name="cat_name",
                            default="")
cat_parent_id = models.IntegerField(null=False, name="cat_parent_id")
cat_level = models.IntegerField(null=False, name="cat_level")
cat_link = models.URLField(null=True, name="cat_link")
parent = TreeForeignKey("self",
                        on_delete=models.CASCADE,
                        null=True,
                        blank=True,
                        related_name="children")

def __str__(self):
    return f"{self.cat_name}"

class MPTTMeta:
    order_insertion_by = ["cat_name"]

Edit 2

cat_parent_id = int(row[2])
obj = Category(
                cat_id=int(row[0]),
                cat_name=row[1],
                cat_parent_id=int(row[2]),
                cat_level=int(row[3]),
                cat_link=row[4],
                parent={cat_parent_id}_id,
            )

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文