SQLAlchemy 中的星型模式

发布于 2024-08-05 12:44:01 字数 137 浏览 6 评论 0原文

我有一个星型架构数据库,我想在 SQLAlchemy 中表示它。现在我面临的问题是如何以最好的方式做到这一点。现在我有很多具有自定义连接条件的属性,因为数据存储在不同的表中。 如果可以重复使用不同事实表的维度,那就太好了,但我还没有弄清楚如何很好地做到这一点。

I have a star-schema architectured database that I want to represent in SQLAlchemy. Now I have the problem on how this can be done in the best possible way. Right now I have a lot of properties with custom join conditions, because the data is stored in different tables.
It would be nice if it would be possible to re-use the dimensions for different fact tablesw but I haven't figured out how that can be done nicely.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

没有伤那来痛 2024-08-12 12:44:01

星型模式中的典型事实表包含对所有维度表的外键引用,因此通常不需要自定义连接条件 - 它们是根据外键引用自动确定的。

例如,具有两个事实表的星型模式如下所示:

Base = declarative_meta()

class Store(Base):
    __tablename__ = 'store'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

class Product(Base):
    __tablename__ = 'product'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

class FactOne(Base):
    __tablename__ = 'sales_fact_one'

    store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
    product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
    units_sold = Column('units_sold', Integer, nullable=False)

    store = relation(Store)
    product = relation(Product)

class FactTwo(Base):
    __tablename__ = 'sales_fact_two'

    store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
    product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
    units_sold = Column('units_sold', Integer, nullable=False)

    store = relation(Store)
    product = relation(Product)

但假设您在任何情况下都希望减少样板文件。我会创建维度类本地的生成器,这些生成器在事实表上配置自己:

class Store(Base):
    __tablename__ = 'store'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

    @classmethod
    def add_dimension(cls, target):
        target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
        target.store = relation(cls)

在这种情况下,用法如下:

class FactOne(Base):
    ...

Store.add_dimension(FactOne)

但是,这有一个问题。假设您要添加的维度列是主键列,则映射器配置将会失败,因为类需要在设置映射之前设置其主键。 ,假设我们使用声明式(您将在下面看到它有很好的效果),为了使这种方法起作用,我们必须使用 instrument_declarative() 函数而不是标准元类:

meta = MetaData()
registry = {}
def register_cls(*cls):
    for c in cls:
        instrument_declarative(c, registry, meta)

因此 那么我们会做一些类似的事情:

class Store(object):
    # ...

class FactOne(object):
    __tablename__ = 'sales_fact_one'

Store.add_dimension(FactOne)

register_cls(Store, FactOne)

如果您确实有充分的理由自定义连接条件,只要有一些创建这些条件的模式,您就可以使用 add_dimension():

class Store(object):
    ...

    @classmethod
    def add_dimension(cls, target):
        target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
        target.store = relation(cls, primaryjoin=target.store_id==cls.id)

但是如果您使用的是 2.6,最后一件很酷的事情是将 add_dimension 变成一个类装饰器。这是一个所有内容都已清理干净的示例:

from sqlalchemy import *
from sqlalchemy.ext.declarative import instrument_declarative
from sqlalchemy.orm import *

class BaseMeta(type):
    classes = set()
    def __init__(cls, classname, bases, dict_):
        klass = type.__init__(cls, classname, bases, dict_)
        if 'metadata' not in dict_:
            BaseMeta.classes.add(cls)
        return klass

class Base(object):
    __metaclass__ = BaseMeta
    metadata = MetaData()
    def __init__(self, **kw):
        for k in kw:
            setattr(self, k, kw[k])

    @classmethod
    def configure(cls, *klasses):
        registry = {}
        for c in BaseMeta.classes:
            instrument_declarative(c, registry, cls.metadata)

class Store(Base):
    __tablename__ = 'store'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

    @classmethod
    def dimension(cls, target):
        target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
        target.store = relation(cls)
        return target

class Product(Base):
    __tablename__ = 'product'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

    @classmethod
    def dimension(cls, target):
        target.product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
        target.product = relation(cls)
        return target

@Store.dimension
@Product.dimension
class FactOne(Base):
    __tablename__ = 'sales_fact_one'

    units_sold = Column('units_sold', Integer, nullable=False)

@Store.dimension
@Product.dimension
class FactTwo(Base):
    __tablename__ = 'sales_fact_two'

    units_sold = Column('units_sold', Integer, nullable=False)

Base.configure()

if __name__ == '__main__':
    engine = create_engine('sqlite://', echo=True)
    Base.metadata.create_all(engine)

    sess = sessionmaker(engine)()

    sess.add(FactOne(store=Store(name='s1'), product=Product(name='p1'), units_sold=27))
    sess.commit()

A typical fact table in a star schema contains foreign key references to all dimension tables, so usually there wouldn't be any need for custom join conditions - they are determined automatically from foreign key references.

For example a star schema with two fact tables would look like:

Base = declarative_meta()

class Store(Base):
    __tablename__ = 'store'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

class Product(Base):
    __tablename__ = 'product'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

class FactOne(Base):
    __tablename__ = 'sales_fact_one'

    store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
    product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
    units_sold = Column('units_sold', Integer, nullable=False)

    store = relation(Store)
    product = relation(Product)

class FactTwo(Base):
    __tablename__ = 'sales_fact_two'

    store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
    product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
    units_sold = Column('units_sold', Integer, nullable=False)

    store = relation(Store)
    product = relation(Product)

But suppose you want to reduce the boilerplate in any case. I'd create generators local to the dimension classes which configure themselves on a fact table:

class Store(Base):
    __tablename__ = 'store'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

    @classmethod
    def add_dimension(cls, target):
        target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
        target.store = relation(cls)

in which case usage would be like:

class FactOne(Base):
    ...

Store.add_dimension(FactOne)

But, there's a problem with that. Assuming the dimension columns you're adding are primary key columns, the mapper configuration is going to fail since a class needs to have its primary keys set up before the mapping is set up. So assuming we're using declarative (which you'll see below has a nice effect), to make this approach work we'd have to use the instrument_declarative() function instead of the standard metaclass:

meta = MetaData()
registry = {}
def register_cls(*cls):
    for c in cls:
        instrument_declarative(c, registry, meta)

So then we'd do something along the lines of:

class Store(object):
    # ...

class FactOne(object):
    __tablename__ = 'sales_fact_one'

Store.add_dimension(FactOne)

register_cls(Store, FactOne)

If you actually have a good reason for custom join conditions, as long as there's some pattern to how those conditions are created, you can generate that with your add_dimension():

class Store(object):
    ...

    @classmethod
    def add_dimension(cls, target):
        target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
        target.store = relation(cls, primaryjoin=target.store_id==cls.id)

But the final cool thing if you're on 2.6, is to turn add_dimension into a class decorator. Here's an example with everything cleaned up:

from sqlalchemy import *
from sqlalchemy.ext.declarative import instrument_declarative
from sqlalchemy.orm import *

class BaseMeta(type):
    classes = set()
    def __init__(cls, classname, bases, dict_):
        klass = type.__init__(cls, classname, bases, dict_)
        if 'metadata' not in dict_:
            BaseMeta.classes.add(cls)
        return klass

class Base(object):
    __metaclass__ = BaseMeta
    metadata = MetaData()
    def __init__(self, **kw):
        for k in kw:
            setattr(self, k, kw[k])

    @classmethod
    def configure(cls, *klasses):
        registry = {}
        for c in BaseMeta.classes:
            instrument_declarative(c, registry, cls.metadata)

class Store(Base):
    __tablename__ = 'store'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

    @classmethod
    def dimension(cls, target):
        target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
        target.store = relation(cls)
        return target

class Product(Base):
    __tablename__ = 'product'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

    @classmethod
    def dimension(cls, target):
        target.product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
        target.product = relation(cls)
        return target

@Store.dimension
@Product.dimension
class FactOne(Base):
    __tablename__ = 'sales_fact_one'

    units_sold = Column('units_sold', Integer, nullable=False)

@Store.dimension
@Product.dimension
class FactTwo(Base):
    __tablename__ = 'sales_fact_two'

    units_sold = Column('units_sold', Integer, nullable=False)

Base.configure()

if __name__ == '__main__':
    engine = create_engine('sqlite://', echo=True)
    Base.metadata.create_all(engine)

    sess = sessionmaker(engine)()

    sess.add(FactOne(store=Store(name='s1'), product=Product(name='p1'), units_sold=27))
    sess.commit()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文