使用 sqlalchemy 在数据库之间高效复制数据
我试图将我用 sqlalchemy 定义的 postgresql + PostGIS 数据库镜像到 sqlite (spatialite)文件数据库。 session.merge()
方法似乎适用于将从第一个会话查询的实例添加到另一个会话,但它无法扩展到近一百万行。请参阅下面的示例,该示例将数据从内存中的 SQLite 数据库复制到另一个内存数据库,以便于重现。我正在寻找一种方法(可能与我现在正在做的完全不同)来有效地将所有数据从一个数据库移动到另一个数据库。
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, Integer, ForeignKey, String
from sqlalchemy.orm import declarative_base, sessionmaker
from sqlalchemy.orm import relationship, joinedload
engine_0 = create_engine('sqlite:///:memory:', echo=True)
engine_1 = create_engine('sqlite:///:memory:', echo=True)
Base = declarative_base()
Session0 = sessionmaker(bind=engine_0)
Session1 = sessionmaker(bind=engine_1)
# Define ORM models
association_table = Table('association', Base.metadata,
Column('parent_id', ForeignKey('parent.id'), primary_key=True),
Column('child_id', ForeignKey('child.id'), primary_key=True)
)
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
name = Column(String)
children = relationship(
"Child",
secondary=association_table,
back_populates="parents")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
name = Column(String)
parents = relationship(
"Parent",
secondary=association_table,
back_populates="children")
# Create schema
Base.metadata.create_all(engine_0)
Base.metadata.create_all(engine_1)
# Create some example instances
# Children
bart = Child(name='Bart')
lisa = Child(name='Lisa')
maggie = Child(name='Maggie')
milhouse = Child(name='Milhouse')
# Parents
homer = Parent(name='Homer',
children=[bart, lisa, maggie])
marge = Parent(name='Marge',
children=[bart, lisa, maggie])
flanders = Parent(name='Ned')
kirk = Parent(name='Kirk', children=[milhouse])
# Insert data into first database
session_0 = Session0()
session_0.add_all([homer, marge, flanders, kirk])
session_0.commit()
# Query the data and insert it into the second database
all_obj = session_0.query(Parent).options(joinedload('*')).all()
session_0.expunge_all()
session_1 = Session1()
for obj in all_obj:
session_1.merge(obj)
session_1.commit()
# MAke sure that 4 instance of child are present in the second database
print(session_1.query(Child).all())
我尝试过(不成功)的一种替代方法是使用 sqlalchemy.orm.make_transient() 函数使父对象瞬态,并使用 session.add_all() 而不是 < code>session.merge() 将对象插入到第二个会话中。但是,这不会传播到关系,并且只有父对象变为瞬态。
I'm trying to mirror a postgresql + PostGIS database that I defined with sqlalchemy to a sqlite (spatialite) file database. The session.merge()
method appears to work for adding the instances queried from the first session to the other session, but it does not scale for nearly a million rows. See the example below that copies data from an in-memory sqlite database to another memory database for the sake of easy reproducibility. I'm looking for an approach (potentially completely different from what I'm doing now) to efficiently move all the data from one database to another.
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, Integer, ForeignKey, String
from sqlalchemy.orm import declarative_base, sessionmaker
from sqlalchemy.orm import relationship, joinedload
engine_0 = create_engine('sqlite:///:memory:', echo=True)
engine_1 = create_engine('sqlite:///:memory:', echo=True)
Base = declarative_base()
Session0 = sessionmaker(bind=engine_0)
Session1 = sessionmaker(bind=engine_1)
# Define ORM models
association_table = Table('association', Base.metadata,
Column('parent_id', ForeignKey('parent.id'), primary_key=True),
Column('child_id', ForeignKey('child.id'), primary_key=True)
)
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
name = Column(String)
children = relationship(
"Child",
secondary=association_table,
back_populates="parents")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
name = Column(String)
parents = relationship(
"Parent",
secondary=association_table,
back_populates="children")
# Create schema
Base.metadata.create_all(engine_0)
Base.metadata.create_all(engine_1)
# Create some example instances
# Children
bart = Child(name='Bart')
lisa = Child(name='Lisa')
maggie = Child(name='Maggie')
milhouse = Child(name='Milhouse')
# Parents
homer = Parent(name='Homer',
children=[bart, lisa, maggie])
marge = Parent(name='Marge',
children=[bart, lisa, maggie])
flanders = Parent(name='Ned')
kirk = Parent(name='Kirk', children=[milhouse])
# Insert data into first database
session_0 = Session0()
session_0.add_all([homer, marge, flanders, kirk])
session_0.commit()
# Query the data and insert it into the second database
all_obj = session_0.query(Parent).options(joinedload('*')).all()
session_0.expunge_all()
session_1 = Session1()
for obj in all_obj:
session_1.merge(obj)
session_1.commit()
# MAke sure that 4 instance of child are present in the second database
print(session_1.query(Child).all())
One alternative approach I have tried (unsuccessfully) is to make the parent objects transient using the sqlalchemy.orm.make_transient()
function and use session.add_all()
instead of session.merge()
to insert the objects into the second session. However, this does not propagate to the relationships and only Parent objects are made transient.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为这不适用于不同的数据库,因为 SQLAlchemy 为每个数据库存储元数据。您的玩具示例之所以有效,是因为它们都是相同的。
I dont think this would work with different databases as SQLAlchemy stores meta data for each database. Your toy example works because they are both the same.