mysql到mongoDB数据迁移

发布于 2025-02-12 13:54:50 字数 1055 浏览 0 评论 0原文

我们知道MongoDB有两种模拟关系的方法关系/实体，即嵌入和参考（请参阅此处的差异）。假设我们有一个用户数据库，其中有两个表名为user和地址中的两个表。 An embedded MongoDB document might look like this:

{
  "_id": 1,
  "name": "Ashley Peacock",
  "addresses": [
    {
      "address_line_1": "10 Downing Street",
      "address_line_2": "Westminster",
      "city": "London",
      "postal_code": "SW1A 2AA"
    },
    {
      "address_line_1": "221B Baker Street",
      "address_line_2": "Marylebone",
      "city": "London",
      "postal_code": "NW1 6XE"
    }
  ]
}

Whereas in a referenced relation, 2 SQL tables will make 2 collections in MongoDB which can be migrated by this

我们如何使用Python直接将MySQL数据作为嵌入式文档迁移？

关于伪代码和算法性能的见解将非常有用。我想到的是通过在MySQL中执行加入来创建视图。但是在这种情况下，我们不会在父母文档中拥有孩子的结构。

原文

We know that MongoDB has two ways of modeling relationships between
relations/entities, namely, embedding and referencing (see difference here). Let's say we have a USER database with two tables in mySQL named user and address. An embedded MongoDB document might look like this:

{
  "_id": 1,
  "name": "Ashley Peacock",
  "addresses": [
    {
      "address_line_1": "10 Downing Street",
      "address_line_2": "Westminster",
      "city": "London",
      "postal_code": "SW1A 2AA"
    },
    {
      "address_line_1": "221B Baker Street",
      "address_line_2": "Marylebone",
      "city": "London",
      "postal_code": "NW1 6XE"
    }
  ]
}

Whereas in a referenced relation, 2 SQL tables will make 2 collections in MongoDB which can be migrated by this apporoach using pymongo.

How can we directly migrate MySQL data as an embedded document using python?

Insights about about Pseudo code and performance of algorithm will be highly useful. Something that comes to my mind is creating views by performing joins in MySQL. But in that case we will not be having the structure of children document inside a parent document.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

热鲨 2025-02-19 13:54:51

首先，对于规范

参考，“嵌入”与“参考”数据的问题称为 a>。 mongo有一个指南描述您何时应该分配。知道何时以及如何将其译出是一个非常普遍的挂断，从SQL转到NOSQL并弄错了它可以消除您可能正在寻找的任何绩效好处。我假设您已经解决了这个问题，因为您似乎开始使用嵌入式方法。

MySQL到Mongo

Mongo拥有一个很好的您可能需要参考。首先加入您的用户和地址表。它看起来像这样：

| _id    | name           | address_line_1        | address_line_2 | ... 
| 1      | Ashley Peacock | 10 Downing Street ... | ...
| 1      | Ashley Peacock | 221B Baker Street ... | ...
| 2      | Bob Jensen     | 343 Main Street ...   | ...
| 2      | Bob Jensen     | 1223 Some Ave ...     | ...
...

然后在行上迭代以创建文档并将其传递给pymongo insert_one。如果在数据库中找不到匹配的文档，则使用upsert = true insert_one将插入新文档，或在找到现有文档中更新现有文档。使用$ push将地址>数据附加到文档中的数组字段地址>。使用此设置，insert_one将根据匹配_id字段自动处理重复和附加地址。请参阅文档有关更多详细信息：

from pymongo import MongoClient

client = MongoClient(port=27017)
db = client.my_db

sql_data = []  # should have your SQL table data
# depending on how you got this into python, you will index with a  
# field name or a number, e.g. row["id"] or row[0] 

for row in sql_data:
    address = {
        "address_line_1": row["address_line_1"],
        "address_line_2": row["address_line_2"],
        "city": row["city"],
        "postal_code": row["postal_code"],
    }
    db.users.update_one(
        {"_id": row["_id"]},
        {"name": row["name"], "$push": {"addresses": address}},
        upsert=True,
    )

Denormalization

First, for canonical reference, the question of "embedded" vs. "referenced" data is called denormalization. Mongo has a guide describing when you should denormalize. Knowing when and how to denormalize is a very common hang-up when moving from SQL to NoSQL and getting it wrong can erase any performance benefits you might be looking for. I'll assume you've got this figured out since you seem set on using an embedded approach.

MySQL to Mongo

Mongo has a great Python tutorial you may want to refer to. First join your user and address tables. It will look something like this:

| _id    | name           | address_line_1        | address_line_2 | ... 
| 1      | Ashley Peacock | 10 Downing Street ... | ...
| 1      | Ashley Peacock | 221B Baker Street ... | ...
| 2      | Bob Jensen     | 343 Main Street ...   | ...
| 2      | Bob Jensen     | 1223 Some Ave ...     | ...
...

Then iterate over the rows to create your documents and pass them to pymongo insert_one. Using upsert=True with insert_one will insert a new document if a matching one is not found in the database, or update an existing document if it is found. Using $push appends the address data to the array field addresses in the document. With this setup, insert_one will automatically handle duplicates and append addresses based on matching _id fields. See the docs for more details:

from pymongo import MongoClient

client = MongoClient(port=27017)
db = client.my_db

sql_data = []  # should have your SQL table data
# depending on how you got this into python, you will index with a  
# field name or a number, e.g. row["id"] or row[0] 

for row in sql_data:
    address = {
        "address_line_1": row["address_line_1"],
        "address_line_2": row["address_line_2"],
        "city": row["city"],
        "postal_code": row["postal_code"],
    }
    db.users.update_one(
        {"_id": row["_id"]},
        {"name": row["name"], "$push": {"addresses": address}},
        upsert=True,
    )

回复收藏 0 原文

~没有更多了~