使用 sqlalchemy 在数据库之间高效复制数据

发布于 2025-01-10 00:51:08 字数 2823 浏览 0 评论 0原文

我试图将我用 sqlalchemy 定义的 postgresql + PostGIS 数据库镜像到 sqlite （spatialite）文件数据库。 session.merge() 方法似乎适用于将从第一个会话查询的实例添加到另一个会话，但它无法扩展到近一百万行。请参阅下面的示例，该示例将数据从内存中的 SQLite 数据库复制到另一个内存数据库，以便于重现。我正在寻找一种方法（可能与我现在正在做的完全不同）来有效地将所有数据从一个数据库移动到另一个数据库。

    from sqlalchemy import create_engine
    from sqlalchemy import Table, Column, Integer, ForeignKey, String
    from sqlalchemy.orm import declarative_base, sessionmaker
    from sqlalchemy.orm import relationship, joinedload
    
    
    engine_0 = create_engine('sqlite:///:memory:', echo=True)
    engine_1 = create_engine('sqlite:///:memory:', echo=True)
    Base = declarative_base()
    Session0 = sessionmaker(bind=engine_0)
    Session1 = sessionmaker(bind=engine_1)
    
    # Define ORM models
    association_table = Table('association', Base.metadata,
        Column('parent_id', ForeignKey('parent.id'), primary_key=True),
        Column('child_id', ForeignKey('child.id'), primary_key=True)
    )
    
    class Parent(Base):
        __tablename__ = 'parent'
        id = Column(Integer, primary_key=True)
        name = Column(String)
        children = relationship(
            "Child",
            secondary=association_table,
            back_populates="parents")
    
    class Child(Base):
        __tablename__ = 'child'
        id = Column(Integer, primary_key=True)
        name = Column(String)
        parents = relationship(
            "Parent",
            secondary=association_table,
            back_populates="children")
    
    # Create schema
    Base.metadata.create_all(engine_0)
    Base.metadata.create_all(engine_1)
    
    # Create some example instances
    # Children
    bart = Child(name='Bart')
    lisa = Child(name='Lisa')
    maggie = Child(name='Maggie')
    milhouse = Child(name='Milhouse')
    # Parents
    homer = Parent(name='Homer',
                   children=[bart, lisa, maggie])
    marge = Parent(name='Marge',
                   children=[bart, lisa, maggie])
    flanders = Parent(name='Ned')
    kirk = Parent(name='Kirk', children=[milhouse])
    
    # Insert data into first database
    session_0 = Session0()
    session_0.add_all([homer, marge, flanders, kirk])
    session_0.commit()
    
    # Query the data and insert it into the second database
    all_obj = session_0.query(Parent).options(joinedload('*')).all()
    session_0.expunge_all()
    session_1 = Session1()
    for obj in all_obj:
        session_1.merge(obj)
    session_1.commit()
    
    # MAke sure that 4 instance of child are present in the second database
    print(session_1.query(Child).all())

我尝试过（不成功）的一种替代方法是使用 sqlalchemy.orm.make_transient() 函数使父对象瞬态，并使用 session.add_all() 而不是 < code>session.merge() 将对象插入到第二个会话中。但是，这不会传播到关系，并且只有父对象变为瞬态。

原文

I'm trying to mirror a postgresql + PostGIS database that I defined with sqlalchemy to a sqlite (spatialite) file database. The session.merge() method appears to work for adding the instances queried from the first session to the other session, but it does not scale for nearly a million rows. See the example below that copies data from an in-memory sqlite database to another memory database for the sake of easy reproducibility. I'm looking for an approach (potentially completely different from what I'm doing now) to efficiently move all the data from one database to another.

    from sqlalchemy import create_engine
    from sqlalchemy import Table, Column, Integer, ForeignKey, String
    from sqlalchemy.orm import declarative_base, sessionmaker
    from sqlalchemy.orm import relationship, joinedload
    
    
    engine_0 = create_engine('sqlite:///:memory:', echo=True)
    engine_1 = create_engine('sqlite:///:memory:', echo=True)
    Base = declarative_base()
    Session0 = sessionmaker(bind=engine_0)
    Session1 = sessionmaker(bind=engine_1)
    
    # Define ORM models
    association_table = Table('association', Base.metadata,
        Column('parent_id', ForeignKey('parent.id'), primary_key=True),
        Column('child_id', ForeignKey('child.id'), primary_key=True)
    )
    
    class Parent(Base):
        __tablename__ = 'parent'
        id = Column(Integer, primary_key=True)
        name = Column(String)
        children = relationship(
            "Child",
            secondary=association_table,
            back_populates="parents")
    
    class Child(Base):
        __tablename__ = 'child'
        id = Column(Integer, primary_key=True)
        name = Column(String)
        parents = relationship(
            "Parent",
            secondary=association_table,
            back_populates="children")
    
    # Create schema
    Base.metadata.create_all(engine_0)
    Base.metadata.create_all(engine_1)
    
    # Create some example instances
    # Children
    bart = Child(name='Bart')
    lisa = Child(name='Lisa')
    maggie = Child(name='Maggie')
    milhouse = Child(name='Milhouse')
    # Parents
    homer = Parent(name='Homer',
                   children=[bart, lisa, maggie])
    marge = Parent(name='Marge',
                   children=[bart, lisa, maggie])
    flanders = Parent(name='Ned')
    kirk = Parent(name='Kirk', children=[milhouse])
    
    # Insert data into first database
    session_0 = Session0()
    session_0.add_all([homer, marge, flanders, kirk])
    session_0.commit()
    
    # Query the data and insert it into the second database
    all_obj = session_0.query(Parent).options(joinedload('*')).all()
    session_0.expunge_all()
    session_1 = Session1()
    for obj in all_obj:
        session_1.merge(obj)
    session_1.commit()
    
    # MAke sure that 4 instance of child are present in the second database
    print(session_1.query(Child).all())

One alternative approach I have tried (unsuccessfully) is to make the parent objects transient using the sqlalchemy.orm.make_transient() function and use session.add_all() instead of session.merge() to insert the objects into the second session. However, this does not propagate to the relationships and only Parent objects are made transient.

分享到QQ

分享到微博