使用 sqlalchemy 在数据库之间高效复制数据

发布于 2025-01-10 00:51:08 字数 2823 浏览 0 评论 0原文

我试图将我用 sqlalchemy 定义的 postgresql + PostGIS 数据库镜像到 sqlite (spatialite)文件数据库。 session.merge() 方法似乎适用于将从第一个会话查询的实例添加到另一个会话,但它无法扩展到近一百万行。请参阅下面的示例,该示例将数据从内存中的 SQLite 数据库复制到另一个内存数据库,以便于重现。我正在寻找一种方法(可能与我现在正在做的完全不同)来有效地将所有数据从一个数据库移动到另一个数据库。

    from sqlalchemy import create_engine
    from sqlalchemy import Table, Column, Integer, ForeignKey, String
    from sqlalchemy.orm import declarative_base, sessionmaker
    from sqlalchemy.orm import relationship, joinedload
    
    
    engine_0 = create_engine('sqlite:///:memory:', echo=True)
    engine_1 = create_engine('sqlite:///:memory:', echo=True)
    Base = declarative_base()
    Session0 = sessionmaker(bind=engine_0)
    Session1 = sessionmaker(bind=engine_1)
    
    # Define ORM models
    association_table = Table('association', Base.metadata,
        Column('parent_id', ForeignKey('parent.id'), primary_key=True),
        Column('child_id', ForeignKey('child.id'), primary_key=True)
    )
    
    class Parent(Base):
        __tablename__ = 'parent'
        id = Column(Integer, primary_key=True)
        name = Column(String)
        children = relationship(
            "Child",
            secondary=association_table,
            back_populates="parents")
    
    class Child(Base):
        __tablename__ = 'child'
        id = Column(Integer, primary_key=True)
        name = Column(String)
        parents = relationship(
            "Parent",
            secondary=association_table,
            back_populates="children")
    
    # Create schema
    Base.metadata.create_all(engine_0)
    Base.metadata.create_all(engine_1)
    
    # Create some example instances
    # Children
    bart = Child(name='Bart')
    lisa = Child(name='Lisa')
    maggie = Child(name='Maggie')
    milhouse = Child(name='Milhouse')
    # Parents
    homer = Parent(name='Homer',
                   children=[bart, lisa, maggie])
    marge = Parent(name='Marge',
                   children=[bart, lisa, maggie])
    flanders = Parent(name='Ned')
    kirk = Parent(name='Kirk', children=[milhouse])
    
    # Insert data into first database
    session_0 = Session0()
    session_0.add_all([homer, marge, flanders, kirk])
    session_0.commit()
    
    # Query the data and insert it into the second database
    all_obj = session_0.query(Parent).options(joinedload('*')).all()
    session_0.expunge_all()
    session_1 = Session1()
    for obj in all_obj:
        session_1.merge(obj)
    session_1.commit()
    
    # MAke sure that 4 instance of child are present in the second database
    print(session_1.query(Child).all())

我尝试过(不成功)的一种替代方法是使用 sqlalchemy.orm.make_transient() 函数使父对象瞬态,并使用 session.add_all() 而不是 < code>session.merge() 将对象插入到第二个会话中。但是,这不会传播到关系,并且只有父对象变为瞬态。

I'm trying to mirror a postgresql + PostGIS database that I defined with sqlalchemy to a sqlite (spatialite) file database. The session.merge() method appears to work for adding the instances queried from the first session to the other session, but it does not scale for nearly a million rows. See the example below that copies data from an in-memory sqlite database to another memory database for the sake of easy reproducibility. I'm looking for an approach (potentially completely different from what I'm doing now) to efficiently move all the data from one database to another.

    from sqlalchemy import create_engine
    from sqlalchemy import Table, Column, Integer, ForeignKey, String
    from sqlalchemy.orm import declarative_base, sessionmaker
    from sqlalchemy.orm import relationship, joinedload
    
    
    engine_0 = create_engine('sqlite:///:memory:', echo=True)
    engine_1 = create_engine('sqlite:///:memory:', echo=True)
    Base = declarative_base()
    Session0 = sessionmaker(bind=engine_0)
    Session1 = sessionmaker(bind=engine_1)
    
    # Define ORM models
    association_table = Table('association', Base.metadata,
        Column('parent_id', ForeignKey('parent.id'), primary_key=True),
        Column('child_id', ForeignKey('child.id'), primary_key=True)
    )
    
    class Parent(Base):
        __tablename__ = 'parent'
        id = Column(Integer, primary_key=True)
        name = Column(String)
        children = relationship(
            "Child",
            secondary=association_table,
            back_populates="parents")
    
    class Child(Base):
        __tablename__ = 'child'
        id = Column(Integer, primary_key=True)
        name = Column(String)
        parents = relationship(
            "Parent",
            secondary=association_table,
            back_populates="children")
    
    # Create schema
    Base.metadata.create_all(engine_0)
    Base.metadata.create_all(engine_1)
    
    # Create some example instances
    # Children
    bart = Child(name='Bart')
    lisa = Child(name='Lisa')
    maggie = Child(name='Maggie')
    milhouse = Child(name='Milhouse')
    # Parents
    homer = Parent(name='Homer',
                   children=[bart, lisa, maggie])
    marge = Parent(name='Marge',
                   children=[bart, lisa, maggie])
    flanders = Parent(name='Ned')
    kirk = Parent(name='Kirk', children=[milhouse])
    
    # Insert data into first database
    session_0 = Session0()
    session_0.add_all([homer, marge, flanders, kirk])
    session_0.commit()
    
    # Query the data and insert it into the second database
    all_obj = session_0.query(Parent).options(joinedload('*')).all()
    session_0.expunge_all()
    session_1 = Session1()
    for obj in all_obj:
        session_1.merge(obj)
    session_1.commit()
    
    # MAke sure that 4 instance of child are present in the second database
    print(session_1.query(Child).all())

One alternative approach I have tried (unsuccessfully) is to make the parent objects transient using the sqlalchemy.orm.make_transient() function and use session.add_all() instead of session.merge() to insert the objects into the second session. However, this does not propagate to the relationships and only Parent objects are made transient.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

倾其所爱 2025-01-17 00:51:08

我认为这不适用于不同的数据库,因为 SQLAlchemy 为每个数据库存储元数据。您的玩具示例之所以有效,是因为它们都是相同的。

I dont think this would work with different databases as SQLAlchemy stores meta data for each database. Your toy example works because they are both the same.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文