用其关系完成对象并避免 sqlalchemy 中不必要的查询

发布于 2024-11-03 08:27:52 字数 3602 浏览 3 评论 0原文

我有一些数据库结构；由于其中大部分与我们无关，我将只描述一些相关的部分。让我们以 Lake Item 对象为例：

items_table = Table("invtypes", gdata_meta,
                    Column("typeID", Integer, primary_key = True),
                    Column("typeName", String, index=True),
                    Column("marketGroupID", Integer, ForeignKey("invmarketgroups.marketGroupID")),
                    Column("groupID", Integer, ForeignKey("invgroups.groupID"), index=True))

mapper(Item, items_table,
       properties = {"group" : relation(Group, backref = "items"),
                     "_Item__attributes" : relation(Attribute, collection_class = attribute_mapped_collection('name')),
                     "effects" : relation(Effect, collection_class = attribute_mapped_collection('name')),
                     "metaGroup" : relation(MetaType,
                                            primaryjoin = metatypes_table.c.typeID == items_table.c.typeID,
                                            uselist = False),
                     "ID" : synonym("typeID"),
                     "name" : synonym("typeName")})

我想在 sqlalchemy/database 层实现一些性能改进，并且有几个想法： 1) 两次请求同一项目：

item = session.query(Item).get(11184)
item = None (reference to item is lost, object is garbage collected)
item = session.query(Item).get(11184)

每次请求都会生成并发出 SQL 查询。为了避免这种情况，我对一个项目对象使用了 2 个自定义映射：

itemMapId = {}
itemMapName = {}

@cachedQuery(1, "lookfor")
def getItem(lookfor, eager=None):
    if isinstance(lookfor, (int, float)):
        id = int(lookfor)
        if eager is None and id in itemMapId:
            item = itemMapId[id]
        else:
            item = session.query(Item).options(*processEager(eager)).get(id)
            itemMapId[item.ID] = item
            itemMapName[item.name] = item
    elif isinstance(lookfor, basestring):
        if eager is None and lookfor in itemMapName:
            item = itemMapName[lookfor]
        else:
            # Items have unique names, so we can fetch just first result w/o ensuring its uniqueness
            item = session.query(Item).options(*processEager(eager)).filter(Item.name == lookfor).first()
            itemMapId[item.ID] = item
            itemMapName[item.name] = item
    return item

我相信 sqlalchemy 会进行类似的对象跟踪，至少是通过主键 (item.ID) 进行跟踪。如果是这样，我可以擦除两个映射（尽管擦除名称映射需要对使用这些查询的应用程序进行少量修改），以免重复功能并使用库存方法。实际问题是：如果sqlalchemy中有这样的功能，如何访问它？

2）关系的热切加载通常有助于节省大量对数据库的请求。比如说，我肯定需要以下一组 item=Item() 属性：

item.group (Group object, according to groupID of our item)
item.group.items (fetch all items from items list of our group)
item.group.items.metaGroup (metaGroup object/relation for every item in the list)

如果我有一些项目 ID 并且尚未加载任何项目，我可以从数据库请求它，急切地加载我需要的所有内容：sqlalchemy 将加入组，其单个查询中的项目和相应的元组。如果我使用默认延迟加载访问它们，sqlalchemy 将需要发出 1 个查询来获取一个项目 + 1 个查询来获取列表中所有项目的组 + 1*#items + 1*#items 来获取每个项目的元组，这是浪费。

2.1）但是如果我已经获取了 Item 对象，并且我想要加载的一些属性已经加载了怎么办？据我了解，当我从数据库中重新获取某些对象时 - 它已加载的关系不会被卸载，我是否正确？

2.2）如果我获取了 Item 对象，并且想要访问其组，我可以使用 item.groupID 来 getGroup，应用我需要的任何急切语句（“items”和“items.metaGroup”）。它应该正确加载组及其请求的关系，而不涉及项目内容。 sqlalchemy 是否会正确地将这个获取的组映射到 item.group，以便当我访问 item.group 时它不会从底层数据库获取任何内容？

2.3）如果我从数据库中获取以下内容：原始项目，item.group和item.group.items列表中的部分项目，其中一些可能已加载metaGroup，那么完成数据结构的最佳策略是什么与上面的 eager list 相同：使用 ("items", "items.metaGroup") eager load 重新获取组，或者单独检查 items 列表中的每个项目，如果 item 或其 metaGroup 未加载 - 加载它们？这似乎取决于具体情况，因为如果所有内容在一段时间前都已加载 - 发出如此繁重的查询是毫无意义的。 sqlalchemy 是否提供了一种跟踪某些对象关系是否已加载的方法，并且能够比一层更深入地查看？

作为 2.3 的示例 - 我可以获取 ID 为 83 的组，急切地获取“items”和“items.metaGroup”。有没有办法使用 sqlalchemy 工具（在这种情况下所有的其中应该加载）？

原文

I have some database structure; as most of it is irrelevant for us, i'll describe just some relevant pieces. Let's lake Item object as example:

items_table = Table("invtypes", gdata_meta,
                    Column("typeID", Integer, primary_key = True),
                    Column("typeName", String, index=True),
                    Column("marketGroupID", Integer, ForeignKey("invmarketgroups.marketGroupID")),
                    Column("groupID", Integer, ForeignKey("invgroups.groupID"), index=True))

mapper(Item, items_table,
       properties = {"group" : relation(Group, backref = "items"),
                     "_Item__attributes" : relation(Attribute, collection_class = attribute_mapped_collection('name')),
                     "effects" : relation(Effect, collection_class = attribute_mapped_collection('name')),
                     "metaGroup" : relation(MetaType,
                                            primaryjoin = metatypes_table.c.typeID == items_table.c.typeID,
                                            uselist = False),
                     "ID" : synonym("typeID"),
                     "name" : synonym("typeName")})

I want to achieve some performance improvements in the sqlalchemy/database layer, and have couple of ideas:
1) Requesting the same item twice:

item = session.query(Item).get(11184)
item = None (reference to item is lost, object is garbage collected)
item = session.query(Item).get(11184)

Each request generates and issues SQL query. To avoid it, i use 2 custom maps for an item object:

itemMapId = {}
itemMapName = {}

@cachedQuery(1, "lookfor")
def getItem(lookfor, eager=None):
    if isinstance(lookfor, (int, float)):
        id = int(lookfor)
        if eager is None and id in itemMapId:
            item = itemMapId[id]
        else:
            item = session.query(Item).options(*processEager(eager)).get(id)
            itemMapId[item.ID] = item
            itemMapName[item.name] = item
    elif isinstance(lookfor, basestring):
        if eager is None and lookfor in itemMapName:
            item = itemMapName[lookfor]
        else:
            # Items have unique names, so we can fetch just first result w/o ensuring its uniqueness
            item = session.query(Item).options(*processEager(eager)).filter(Item.name == lookfor).first()
            itemMapId[item.ID] = item
            itemMapName[item.name] = item
    return item

I believe sqlalchemy does similar object tracking, at least by primary key (item.ID). If it does, i can wipe both maps (although wiping name map will require minor modifications to application which uses these queries) to not duplicate functionality and use stock methods. Actual question is: if there's such functionality in sqlalchemy, how to access it?

2) Eager loading of relationships often helps to save alot of requests to database. Say, i'll definitely need following set of item=Item() properties:

item.group (Group object, according to groupID of our item)
item.group.items (fetch all items from items list of our group)
item.group.items.metaGroup (metaGroup object/relation for every item in the list)

If i have some item ID and no item is loaded yet, i can request it from the database, eagerly loading everything i need: sqlalchemy will join group, its items and corresponding metaGroups within single query. If i'd access them with default lazy loading, sqlalchemy would need to issue 1 query to grab an item + 1 to get group + 1*#items for all items in the list + 1*#items to get metaGroup of each item, which is wasteful.

2.1) But what if i already have Item object fetched, and some of the properties which i want to load are already loaded? As far as i understand, when i re-fetch some object from the database - its already loaded relations do not become unloaded, am i correct?

2.2) If i have Item object fetched, and want to access its group, i can just getGroup using item.groupID, applying any eager statements i'll need ("items" and "items.metaGroup"). It should properly load group and its requested relations w/o touching item stuff. Will sqlalchemy properly map this fetched group to item.group, so that when i access item.group it won't fetch anything from the underlying database?

2.3) If i have following things fetched from the database: original item, item.group and some portion of the items from the item.group.items list some of which may have metaGroup loaded, what would be best strategy for completing data structure to the same as eager list above: re-fetch group with ("items", "items.metaGroup") eager load, or check each item from items list individually, and if item or its metaGroup is not loaded - load them? It seems to depend on the situation, because if everything has already been loaded some time ago - issuing such heavy query is pointless. Does sqlalchemy provide a way to track if some object relation is loaded, with the ability to look deeper than just one level?

As an illustration to 2.3 - i can fetch group with ID 83, eagerly fetching "items" and "items.metaGroup". Is there a way to determine from an item (which has groupID of an 83), does it have "group", "group.items" and "group.items.metaGroup" loaded or not, using sqlalchemy tools (in this case all of them should be loaded)?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟凡古楼 2024-11-10 08:27:52

要强制加载惰性属性，只需访问它们即可。这是最简单的方法，对于关系来说效果很好，但对于列来说效率不高（您将为同一个表中的每一列获得单独的 SQL 查询）。您可以从 sqlalchemy.orm.attributes.instance_state(obj).unloaded 获取所有已卸载属性（关系和列）的列表。

您在示例中没有使用延迟列，但为了完整起见，我将在此处描述它们。处理延迟列的典型场景如下：

使用 deferred() 修饰选定的列。通过使用 deferred() 的 group 参数将它们组合成一个或多个组。
需要时，在查询中使用 undefer() 和 undefer_group() 选项。
访问组中的延迟列将加载该组中的所有列。

不幸的是，这不能反向工作：您可以将列组合成组，而无需默认使用 column_property(Column(…), group=…) 延迟加载它们，但是 defer() code> 选项不会影响它们（它仅适用于 Column，不适用于列属性，至少在 0.6.7 中）。

Nathan Villaescusa 建议的强制加载延迟列属性 session.refresh(obj, attribute_names=…) 可能是最好的解决方案。我看到的唯一缺点是它首先使属性过期，因此您必须确保作为 attribute_names 参数传递的属性中没有加载的属性（例如，通过使用与 state.unloaded 的交集）。

更新

1) SQLAlchemy 确实跟踪加载的对象。这就是 ORM 的工作原理：每个身份在会话中必须有唯一的对象。默认情况下，其内部缓存很弱（使用 weak_identity_map=False 更改此设置），因此一旦代码中没有引用该对象，该对象就会从缓存中删除。当对象已在会话中时，SQLAlchemy 不会对 query.get(pk) 执行 SQL 请求。但这仅适用于 get() 方法，因此 query.filter_by(id=pk).first() 将执行 SQL 请求并使用加载的数据刷新会话中的对象。

2) 关系的热切加载会导致请求减少，但并不总是更快。您必须检查您的数据库和数据。

2.1) 从数据库重新获取数据不会卸载通过关系绑定的对象。

2.2) item.group 使用 query.get() 方法加载，因此如果对象已在会话中，则不会导致 SQL 请求。

2.3) 是的，这取决于情况。对于大多数情况，最好是希望 SQLAlchemy 使用正确的策略:)。对于已加载的关系，您可以检查相关对象的关系是否通过 state.unloaded 加载，并递归到任何深度。但是，当关系尚未加载时，您无法知道相关对象及其关系是否已加载：即使关系尚未加载，相关对象也可能已经在会话中（想象一下您请求第一项，加载其组，然后请求具有相同组的其他项目）。对于您的特定示例，我认为只需递归检查 state.unloaded 就没有问题。

To force loading lazy attributes just access them. This the simplest way and it works fine for relations, but is not as efficient for Columns (you will get separate SQL query for each column in the same table). You can get a list of all unloaded properties (both relations and columns) from sqlalchemy.orm.attributes.instance_state(obj).unloaded.

You don't use deferred columns in your example, but I'll describe them here for completeness. The typical scenario for handling deferred columns is the following:

Decorate selected columns with deferred(). Combine them into one or several groups by using group parameter to deferred().
Use undefer() and undefer_group() options in query when desired.
Accessing deferred column put in group will load all columns in this group.

Unfortunately this doesn't work reverse: you can combine columns into groups without deferring loading of them by default with column_property(Column(…), group=…), but defer() option won't affect them (it works for Columns only, not column properties, at least in 0.6.7).

To force loading deferred column properties session.refresh(obj, attribute_names=…) suggested by Nathan Villaescusa is probably the best solution. The only disadvantage I see is that it expires attributes first so you have to insure there is not loaded attributes among passed as attribute_names argument (e.g. by using intersection with state.unloaded).

Update

1) SQLAlchemy does track loaded objects. That's how ORM works: there must be the only object in the session for each identity. Its internal cache is weak by default (use weak_identity_map=False to change this), so the object is expunged from the cache as soon as there in no reference to it in your code. SQLAlchemy won't do SQL request for query.get(pk) when object is already in the session. But this works for get() method only, so query.filter_by(id=pk).first() will do SQL request and refresh object in the session with loaded data.

2) Eager loading of relations will lead to fewer requests, but it's not always faster. You have to check this for your database and data.

2.1) Refetching data from database won't unload objects bound via relations.

2.2) item.group is loaded using query.get() method, so there won't lead to SQL request if object is already in the session.

2.3) Yes, it depends on situation. For most cases it's the best is to hope SQLAlchemy will use the right strategy :). For already loaded relation you can check if related objects' relations are loaded via state.unloaded and so recursively to any depth. But when relation is not loaded yet you can't get know whether related objects and their relations are already loaded: even when relation is not yet loaded the related object[s] might be already in the session (just imagine you request first item, load its group and then request other item that has the same group). For your particular example I see no problem to just check state.unloaded recursively.

回复收藏 0 原文

趁微风不噪 2024-11-10 08:27:52

1）
来自会话文档：

[会话]在某种程度上被用作缓存，因为
它实现了身份映射
模式，并存储键入的对象
他们的主键。然而，它并没有
进行任何类型的查询缓存。 ……只是
当你说 query.get({一些主要
key}）会话不必
发出查询。

2.1）你是对的，刷新对象时关系不会被修改。

2.2) 是的，该组将出现在身份映射中。

2.3）我相信你最好的选择是尝试在单个查询中重新加载整个 group.items 。根据我的经验，发出一个大请求通常比发出几个较小请求要快得多。唯一一次只重新加载特定的 group.item 才有意义，因为其中正好有一个需要加载。尽管在这种情况下，您正在执行一项大型查询而不是一项小型查询，因此您实际上并没有减少查询数量。

我还没有尝试过，但我相信您应该能够使用 sqlalchemy.orm.util.identity_key 方法来确定对象是否在 sqlalchemy 的身份映射中。我有兴趣了解调用 identiy_key(Group, 83) 返回什么。

最初的问题）
如果我理解正确的话，您有一个从数据库中获取的对象，其中的一些关系是预先加载的，并且您想通过单个查询获取其余的关系？我相信您也许能够使用会话。 fresh() 方法传入要加载的关系的名称。