高效增量实现poset

发布于 2024-11-10 01:29:28 字数 1590 浏览 0 评论 0原文

我正在根据 SQLAlchemy 实现一个具有偏序集数学特征的结构，我需要能够一次添加和删除一个边缘。

在我当前的最佳设计中，我使用两个邻接列表，一个是分配列表（大约是哈斯图中的边），因为我需要保留哪些节点对被显式设置为有序，另一个邻接列表是传递的第一个的关闭，以便我可以有效地查询一个节点是否相对于另一个节点排序。现在，每次在分配邻接列表中添加或删除一条边时，我都会重新计算传递闭包。

它看起来像这样：

assignment = Table('assignment', metadata,
    Column('parent', Integer, ForeignKey('node.id')),
    Column('child', Integer, ForeignKey('node.id')))

closure = Table('closure', metadata,
    Column('ancestor', Integer, ForeignKey('node.id')),
    Column('descendent', Integer, ForeignKey('node.id')))

class Node(Base):
    __tablename__ = 'node'
    id = Column(Integer, primary_key=True)

    parents = relationship(Node, secondary=assignment,
        backref='children',
        primaryjoin=id == assignment.c.parent,
        secondaryjoin=id == assignment.c.child)

    ancestors = relationship(Node, secondary=closure,
        backref='descendents',
        primaryjoin=id == closure.c.ancestor,
        secondaryjoin=id == closure.c.descendent,
        viewonly=True)

    @classmethod
    def recompute_ancestry(cls.conn):
        conn.execute(closure.delete())
        adjacent_values = conn.execute(assignment.select()).fetchall()
        conn.execute(closure.insert(), floyd_warshall(adjacent_values))

其中 floyd_warshall() 是同名算法的实现。

这导致我遇到两个问题。首先，它似乎效率不高，但我不确定可以使用哪种算法。

第二个更多的是关于每次分配发生时都必须显式调用 Node.recompute_ancestry() 的实用性，并且仅在分配被刷新到会话中之后才使用正确的连接。如果我想查看 ORM 中反映的更改，我必须再次刷新会话。我想，如果我能用 orm 来表达重新计算祖先操作，那就容易多了。

原文

I'm implementing in terms of SQLAlchemy a structure that has the mathematical characteristic of Partially Ordered Set, in which I need to be able to add and remove edges one at a time.

In my current, best design, I use two adjacency lists, one being the assignment list (approximately edges in the Hass Diagram), since I need to preserve which pairs of nodes are explicitly set as ordered, and the other adjacency list is the transitive closure of the first, so that I can efficiently query if one node is ordered with respect to another. Right now, I recompute the transitive closure each time an edge is added to or removed from the assignment adjacency list.

It looks something like this:

assignment = Table('assignment', metadata,
    Column('parent', Integer, ForeignKey('node.id')),
    Column('child', Integer, ForeignKey('node.id')))

closure = Table('closure', metadata,
    Column('ancestor', Integer, ForeignKey('node.id')),
    Column('descendent', Integer, ForeignKey('node.id')))

class Node(Base):
    __tablename__ = 'node'
    id = Column(Integer, primary_key=True)

    parents = relationship(Node, secondary=assignment,
        backref='children',
        primaryjoin=id == assignment.c.parent,
        secondaryjoin=id == assignment.c.child)

    ancestors = relationship(Node, secondary=closure,
        backref='descendents',
        primaryjoin=id == closure.c.ancestor,
        secondaryjoin=id == closure.c.descendent,
        viewonly=True)

    @classmethod
    def recompute_ancestry(cls.conn):
        conn.execute(closure.delete())
        adjacent_values = conn.execute(assignment.select()).fetchall()
        conn.execute(closure.insert(), floyd_warshall(adjacent_values))

where floyd_warshall() is an implementation of the algorithm by the same name.

This is leading me to two problems. The first is that It doesn't seem to be very efficient, but I'm not sure of what sort of algorithm I could use instead.

The second is more about the practicality of having to explicitly call Node.recompute_ancestry() each time an assignment occurs, and only after the assignments are flushed into the session and with the proper connections. If I want to see the changes reflected in the ORM, I'd have to flush the session again. It would be much easier, I think, If I could express the recompute ancestry operation in terms of the orm.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

_失温 2024-11-17 01:29:28

好吧，我去想办法解决我自己的问题。它的粗略部分是将Floyd-Warshall算法应用于父节点的祖先的后代与子节点的后代的祖先的交集，但仅将输出应用于父母的祖先和孩子的后代的联合。我花了很多时间在上面，最终发布了该过程

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

association_table = Table('edges', Base.metadata,
    Column('predecessor', Integer, 
           ForeignKey('nodes.id'), primary_key=True),
    Column('successor', Integer, 
           ForeignKey('nodes.id'), primary_key=True))

path_table = Table('paths', Base.metadata,
    Column('predecessor', Integer, 
           ForeignKey('nodes.id'), primary_key=True),
    Column('successor', Integer, 
           ForeignKey('nodes.id'), primary_key=True))

class Node(Base):
    __tablename__ = 'nodes'
    id = Column(Integer, primary_key=True)
    # extra columns

    def __repr__(self):
        return '<Node #%r>' % (self.id,)

    successors = relationship('Node', backref='predecessors',
        secondary=association_table,
        primaryjoin=id == association_table.c.predecessor,
        secondaryjoin=id == association_table.c.successor)

    before = relationship('Node', backref='after',
        secondary=path_table,
        primaryjoin=id == path_table.c.predecessor,
        secondaryjoin=id == path_table.c.successor)

    def __lt__(self, other):
        return other in self.before

    def add_successor(self, other):
        if other in self.successors:
            return
        self.successors.append(other)
        self.before.append(other)
        for descendent in other.before:
            if descendent not in self.before:
                self.before.append(descendent)
        for ancestor in self.after:
            if ancestor not in other.after:
                other.after.append(ancestor)

    def del_successor(self, other):
        if not self < other:
            # nodes are not connected, do nothing!
            return
        if not other in self.successors:
            # nodes aren't adjacent, but this *could*
            # be a warning...
            return

        self.successors.remove(other)

        # we buld up a set of nodes that will be affected by the removal
        # we just did.  
        ancestors = set(other.after)
        descendents = set(self.before)

        # we also need to build up a list of nodes that will determine
        # where the paths may be.  basically, we're looking for every 
        # node that is both before some node in the descendents and
        # ALSO after the ancestors.  Such nodes might not be comparable
        # to self or other, but may still be part of a path between
        # the nodes in ancestors and the nodes in descendents.
        ancestors_descendents = set()
        for ancestor in ancestors:
            ancestors_descendents.add(ancestor)
            for descendent in ancestor.before:
                ancestors_descendents.add(descendent)

        descendents_ancestors = set()
        for descendent in descendents:
            descendents_ancestors.add(descendent)
            for ancestor in descendent.after:
                descendents_ancestors.add(ancestor)
        search_set = ancestors_descendents & descendents_ancestors

        known_good = set() # This is the 'paths' from the 
                           # original algorithm.  

        # as before, we need to initialize it with the paths we 
        # know are good.  this is just the successor edges in
        # the search set.
        for predecessor in search_set:
            for successor in search_set:
                if successor in predecessor.successors:
                    known_good.add((predecessor, successor))

        # We now can work our way through floyd_warshall to resolve
        # all adjacencies:
        for ancestor in ancestors:
            for descendent in descendents:
                if (ancestor, descendent) in known_good:
                    # already got this one, so we don't need to look for an
                    # intermediate.  
                    continue
                for intermediate in search_set:
                    if (ancestor, intermediate) in known_good \
                            and (intermediate, descendent) in known_good:
                        known_good.add((ancestor, descendent))
                        break # don't need to look any further for an
                              # intermediate, we can move on to the next
                              # descendent.  


        # sift through the bad nodes and update the links
        for ancestor in ancestors:
            for descendent in descendents:
                if descendent in ancestor.before \
                        and (ancestor, descendent) not in known_good:
                    ancestor.before.remove(descendent)

Well, I went and worked out the solution to my own problem. The crude part of it is to apply the Floyd-Warshall algorithm on the intersection of the descendents of the ancestors of the parent node with the ancestors of the descendents of the child node, but only apply the output to the union of the parent's ancestors and child's descendents. I spent so much time on it I ended up posting the process on my blog, but here is teh codes.

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

association_table = Table('edges', Base.metadata,
    Column('predecessor', Integer, 
           ForeignKey('nodes.id'), primary_key=True),
    Column('successor', Integer, 
           ForeignKey('nodes.id'), primary_key=True))

path_table = Table('paths', Base.metadata,
    Column('predecessor', Integer, 
           ForeignKey('nodes.id'), primary_key=True),
    Column('successor', Integer, 
           ForeignKey('nodes.id'), primary_key=True))

class Node(Base):
    __tablename__ = 'nodes'
    id = Column(Integer, primary_key=True)
    # extra columns

    def __repr__(self):
        return '<Node #%r>' % (self.id,)

    successors = relationship('Node', backref='predecessors',
        secondary=association_table,
        primaryjoin=id == association_table.c.predecessor,
        secondaryjoin=id == association_table.c.successor)

    before = relationship('Node', backref='after',
        secondary=path_table,
        primaryjoin=id == path_table.c.predecessor,
        secondaryjoin=id == path_table.c.successor)

    def __lt__(self, other):
        return other in self.before

    def add_successor(self, other):
        if other in self.successors:
            return
        self.successors.append(other)
        self.before.append(other)
        for descendent in other.before:
            if descendent not in self.before:
                self.before.append(descendent)
        for ancestor in self.after:
            if ancestor not in other.after:
                other.after.append(ancestor)

    def del_successor(self, other):
        if not self < other:
            # nodes are not connected, do nothing!
            return
        if not other in self.successors:
            # nodes aren't adjacent, but this *could*
            # be a warning...
            return

        self.successors.remove(other)

        # we buld up a set of nodes that will be affected by the removal
        # we just did.  
        ancestors = set(other.after)
        descendents = set(self.before)

        # we also need to build up a list of nodes that will determine
        # where the paths may be.  basically, we're looking for every 
        # node that is both before some node in the descendents and
        # ALSO after the ancestors.  Such nodes might not be comparable
        # to self or other, but may still be part of a path between
        # the nodes in ancestors and the nodes in descendents.
        ancestors_descendents = set()
        for ancestor in ancestors:
            ancestors_descendents.add(ancestor)
            for descendent in ancestor.before:
                ancestors_descendents.add(descendent)

        descendents_ancestors = set()
        for descendent in descendents:
            descendents_ancestors.add(descendent)
            for ancestor in descendent.after:
                descendents_ancestors.add(ancestor)
        search_set = ancestors_descendents & descendents_ancestors

        known_good = set() # This is the 'paths' from the 
                           # original algorithm.  

        # as before, we need to initialize it with the paths we 
        # know are good.  this is just the successor edges in
        # the search set.
        for predecessor in search_set:
            for successor in search_set:
                if successor in predecessor.successors:
                    known_good.add((predecessor, successor))

        # We now can work our way through floyd_warshall to resolve
        # all adjacencies:
        for ancestor in ancestors:
            for descendent in descendents:
                if (ancestor, descendent) in known_good:
                    # already got this one, so we don't need to look for an
                    # intermediate.  
                    continue
                for intermediate in search_set:
                    if (ancestor, intermediate) in known_good \
                            and (intermediate, descendent) in known_good:
                        known_good.add((ancestor, descendent))
                        break # don't need to look any further for an
                              # intermediate, we can move on to the next
                              # descendent.  


        # sift through the bad nodes and update the links
        for ancestor in ancestors:
            for descendent in descendents:
                if descendent in ancestor.before \
                        and (ancestor, descendent) not in known_good:
                    ancestor.before.remove(descendent)

回复收藏 0 原文

甜心小果奶 2024-11-17 01:29:28

插入时更新闭包，并根据 orm 执行此操作：

def add_assignment(parent, child):
"""And parent-child relationship between two nodes"""
    parent.descendants += child.descendants + [child]
    child.ancestors += parent.ancestors + [parent] 
    parent.children += child

如果需要删除分配，这在纯 sql 中更快：

def del_assignment(parent, child):
    parent.children.remove(child)
    head = [parent.id] + [node.id for node in parent.ancestors]
    tail = [child.id] + [node.id for node in child.descendants]
    session.flush()
    session.execute(closure.delete(), and_(
          closure.c.ancestor.in_(head), 
          closure.c.descendant.in_(tail)))
    session.expire_all()

Update closure as you insert, and do so in terms of orm:

def add_assignment(parent, child):
"""And parent-child relationship between two nodes"""
    parent.descendants += child.descendants + [child]
    child.ancestors += parent.ancestors + [parent] 
    parent.children += child

If you need to delete assignments, this is faster in pure sql:

def del_assignment(parent, child):
    parent.children.remove(child)
    head = [parent.id] + [node.id for node in parent.ancestors]
    tail = [child.id] + [node.id for node in child.descendants]
    session.flush()
    session.execute(closure.delete(), and_(
          closure.c.ancestor.in_(head), 
          closure.c.descendant.in_(tail)))
    session.expire_all()

回复收藏 0 原文

~没有更多了~