用循环腌制图表

发布于 2025-01-08 07:42:25 字数 1788 浏览 0 评论 0原文

我在 python 中有一个自定义节点类,它内置于一个图形(它是一个字典)中。由于创建它们需要一段时间,因此我想对它们进行腌制,这样我就不必每次运行代码时都重新构建它们。

不幸的是,因为该图有循环,所以 cPickle 达到了最大递归深度:

运行时错误:酸洗对象时超出最大递归深度

这是我的节点对象:

class Node:
    def __init__(self, name):
        self.name = name
        self.uid = 0
        self.parents = set()
        self.children = set()

    def __hash__(self):
        return hash(self.name)

    def __eq__(self, that):
        return self.name == that.name

    def __str__(self):
        return "\n".join(["Name: " + self.name,
                          "\tChildren:" + ", ".join([c.name for c in self.children]),
                          "\tParents:" + ", ".join([p.name for p in self.parents])
                          ]
                         )

这就是我构建图形的方式:

def buildGraph(input):
    graph = {}
    idToNode = {}

    for line in input:
        ## Input from text line by line looks like
        ## source.node -> target.node
        source, arr, target = line.split()
        if source in graph:
            nsource = graph[source]
        else:
            nsource = Node(source)
            nsource.uid = len(graph)
            graph[source] = nsource
            idToNode[nsource.uid] = nsource

        if target in graph:
            ntarget = graph[target]
        else:
            ntarget = Node(target)
            ntarget.uid = len(graph)
            graph[target] = ntarget
            idToNode[ntarget.uid] = ntarget

        nsource.children.add(ntarget)
        ntarget.parents.add(nsource)
    return graph

然后在我的主目录中,我有

    graph = buildGraph(input_file)
    bo = cPickle.dumps(graph)

第二行是我得到递归深度错误的地方。

除了改变Node的结构之外还有什么解决办法吗?

I have a custom node class in python that is built into a graph (which is a dictionary). Since these take a while to create, I'd like to pickle them so that I don't have to reconstruct them everytime I run my code.

Unfortunately, because this graph has cycles, cPickle hits the maximum recursion depth:

RuntimeError: maximum recursion depth exceeded while pickling an object

This is my node object:

class Node:
    def __init__(self, name):
        self.name = name
        self.uid = 0
        self.parents = set()
        self.children = set()

    def __hash__(self):
        return hash(self.name)

    def __eq__(self, that):
        return self.name == that.name

    def __str__(self):
        return "\n".join(["Name: " + self.name,
                          "\tChildren:" + ", ".join([c.name for c in self.children]),
                          "\tParents:" + ", ".join([p.name for p in self.parents])
                          ]
                         )

This is how I build my graph:

def buildGraph(input):
    graph = {}
    idToNode = {}

    for line in input:
        ## Input from text line by line looks like
        ## source.node -> target.node
        source, arr, target = line.split()
        if source in graph:
            nsource = graph[source]
        else:
            nsource = Node(source)
            nsource.uid = len(graph)
            graph[source] = nsource
            idToNode[nsource.uid] = nsource

        if target in graph:
            ntarget = graph[target]
        else:
            ntarget = Node(target)
            ntarget.uid = len(graph)
            graph[target] = ntarget
            idToNode[ntarget.uid] = ntarget

        nsource.children.add(ntarget)
        ntarget.parents.add(nsource)
    return graph

Then in my main, I have

    graph = buildGraph(input_file)
    bo = cPickle.dumps(graph)

and the second line is where I get my recursion depth error.

Are there any solutions outside of changing the structure of Node?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

我三岁 2025-01-15 07:42:25

您需要为 pickle 准备对象:如果您有一个循环,则需要打破循环并以其他形式存储此信息。

Pickle 使用方法 __getstate__ 来准备要 pickle 的对象(之前调用),并使用 __setstate__ 来初始化对象。

class SomethingPickled(object):
    ## Compress and uncycle data before pickle.
    def __getstate__(self):
        # deep copy object
        state = self.__dict__.copy()
        # break cycles
        state['uncycled'] = self.yourUncycleMethod(state['cycled'])
        del state['cycle']
        # send to pickle
        return state

    ## Expand data before unpickle.
    def __setstate__(self, state):
        # restore cycles
        state['cycle'] = self.yourCycleMethod(state['uncycled'])
        del state['uncycle']
        self.__dict__.update(state)

我相信您会找到如何打破和加入循环的想法:)

You need to prepare the object for pickle: if you have a cycle you need to break cycles and store this information in some other form.

Pickle use methods __getstate__ to prepare object to pickle (it call before) and __setstate__ to initialize object.

class SomethingPickled(object):
    ## Compress and uncycle data before pickle.
    def __getstate__(self):
        # deep copy object
        state = self.__dict__.copy()
        # break cycles
        state['uncycled'] = self.yourUncycleMethod(state['cycled'])
        del state['cycle']
        # send to pickle
        return state

    ## Expand data before unpickle.
    def __setstate__(self, state):
        # restore cycles
        state['cycle'] = self.yourCycleMethod(state['uncycled'])
        del state['uncycle']
        self.__dict__.update(state)

I am sure than you will find idea how to break and join cycles :)

街角卖回忆 2025-01-15 07:42:25

我不认为你的图是循环的这一事实是问题——pickle(和cPickle)应该很好地处理循环数据结构。我尝试了以下操作(使用您对 Node 的定义),它工作得很好:

>>> n1 = Node('a')
>>> n2 = Node('b')
>>> n1.parents.add(n2)
>>> n2.parents.add(n1)
>>> n2.children.add(n1)
>>> n1.children.add(n1)

>>> import cPickle as pickle
>>> pickle.dumps(n1)

事实上,即使有大周期,我也没有遇到问题。例如,这对我来说效果很好:(

>>> def node_cycle(n):
...     start_node = prev_node = Node('node0')
...     for i in range(n):
...         node = Node('node%d' % (i+1))
...         node.parents.add(prev_node)
...         prev_node.children.add(node)
...         prev_node = node
...     start_node.parents.add(node)
...     node.children.add(start_node)

>>> cycle = node_cycle(100000) # cycle of 100k nodes
>>> pickle.dumps(cycle)

这一切都在 Python 2.7.1 上进行了测试)

尽管如此,pickle 可能最终会出现非常深的递归还有其他原因,具体取决于数据结构的形状。如果这是真正的问题,那么您也许可以通过以下方式解决它:

>>> import sys
>>> sys.setrecursionlimit(10000)

I don't think that the fact that your graph is cyclic is the problem -- pickle (and cPickle) should handle cyclic data structures just fine. I tried the following (with your definition of Node) and it worked just fine:

>>> n1 = Node('a')
>>> n2 = Node('b')
>>> n1.parents.add(n2)
>>> n2.parents.add(n1)
>>> n2.children.add(n1)
>>> n1.children.add(n1)

>>> import cPickle as pickle
>>> pickle.dumps(n1)

Indeed, even with large cycles I didn't run into a problem. E.g., this works fine for me:

>>> def node_cycle(n):
...     start_node = prev_node = Node('node0')
...     for i in range(n):
...         node = Node('node%d' % (i+1))
...         node.parents.add(prev_node)
...         prev_node.children.add(node)
...         prev_node = node
...     start_node.parents.add(node)
...     node.children.add(start_node)

>>> cycle = node_cycle(100000) # cycle of 100k nodes
>>> pickle.dumps(cycle)

(This was all tested on Python 2.7.1)

There are other reasons why pickle might end up with very deep recursion though, depending on the shape of your data structure. If this is the real problem, then you might be able to fix it with something like this:

>>> import sys
>>> sys.setrecursionlimit(10000)
执笔绘流年 2025-01-15 07:42:25

在这里,这个修改后的节点类仅将对象名称作为节点中的字符串保存,并在您检索节点的“子”或“父”属性时为您提供完整的“节点”对象集。

内部没有循环 - 因此它应该避免无限循环陷阱。您可以实现其他辅助方法来根据需要简化导航。

class Node(object):
    all_nodes = {}
    def __new__(cls, name):
        self = object.__new__(cls)
        cls.all_nodes[name] = self
        return self

    def __getstate__(self):
        self.all_nodes = self.__class__.all_nodes
        return self.__dict__

    def __setstate__(self, dct):
        self.__class__.all_nodes = dct["all_nodes"]
        del dct["all_nodes"]
        self.__dict__ = dct

    def __init__(self, name):
        #self.all_nodes = self.__class__.all_nodes
        self.name = name
        self.uid = 0
        self._parents = set()
        self._children = set()

    def __hash__(self):
        return hash(self.name)

    def __eq__(self, that):
        return self.name == that.name

    def __repr__(self):
        return "\n" + "\n".join(["Name: " + self.name,
                          "\tChildren:" + ", ".join([c.name for c in self.children]),
                          "\tParents:" + ", ".join([p.name for p in self.parents])
                          ]
                         )
    def get_relations(self, which):
        names = getattr(self, which)
        return set(self.__class__.all_nodes[name] for name in names)
    @property
    def children(self):
        return self.get_relations("_children")
    @property
    def parents(self):
        return self.get_relations("_parents")

    def __contains__(self, item):
        return item.name in self._children

    def add(self, child):
        self._children.add(child.name)
        child._parents.add(self.name)
    connect_child = add



#example and testing:

from cPickle import loads, dumps

n1 = Node("n1")
n2 = Node("n2")
n3 = Node("n3")

n1.add(n2)
n2.add(n3)
n3.add(n1)

print n1, n2, n3


p1 = dumps(n1)
Node.all_nodes.clear()
p2 = loads(p1)

print p2
print p2.children
print p2.children.pop().children

print Node.all_nodes

缺点是它维护一个名为“all_nodes”的类字典,其中引用了所有实际创建的节点。 (Pickle 足够聪明,对于给定的图只腌制一次该字典,因为它被所有 Node 对象引用)。
类范围的“all_nodes”引用的问题是,如果您需要pickle和unpickle不同的图集9假设您确实使用一组节点创建图g1,在另一次运行中,使用另一组节点创建图g2,并且然后,如果您取消选取 g1,然后取消选取 g2,则取消选取 g2 将覆盖 g1 的节点引用)。如果你需要这个工作,请在评论中询问,我可以想出一些东西 - 我能想到的更简单的事情是有一个“图形”类,它将保存所有节点的字典(而不是将其放在 Node 类中) )

Here, this modified node class holds only the names of the objects as strings in a node, and gives you a set with full "Node" objects when you retrieve either the "children" or the "parents" attribute of a node.

Internally there are no cycles - so it should avoid the infinity loop trap.You can implement additional auxiliar methods to ease navigation as you want.

class Node(object):
    all_nodes = {}
    def __new__(cls, name):
        self = object.__new__(cls)
        cls.all_nodes[name] = self
        return self

    def __getstate__(self):
        self.all_nodes = self.__class__.all_nodes
        return self.__dict__

    def __setstate__(self, dct):
        self.__class__.all_nodes = dct["all_nodes"]
        del dct["all_nodes"]
        self.__dict__ = dct

    def __init__(self, name):
        #self.all_nodes = self.__class__.all_nodes
        self.name = name
        self.uid = 0
        self._parents = set()
        self._children = set()

    def __hash__(self):
        return hash(self.name)

    def __eq__(self, that):
        return self.name == that.name

    def __repr__(self):
        return "\n" + "\n".join(["Name: " + self.name,
                          "\tChildren:" + ", ".join([c.name for c in self.children]),
                          "\tParents:" + ", ".join([p.name for p in self.parents])
                          ]
                         )
    def get_relations(self, which):
        names = getattr(self, which)
        return set(self.__class__.all_nodes[name] for name in names)
    @property
    def children(self):
        return self.get_relations("_children")
    @property
    def parents(self):
        return self.get_relations("_parents")

    def __contains__(self, item):
        return item.name in self._children

    def add(self, child):
        self._children.add(child.name)
        child._parents.add(self.name)
    connect_child = add



#example and testing:

from cPickle import loads, dumps

n1 = Node("n1")
n2 = Node("n2")
n3 = Node("n3")

n1.add(n2)
n2.add(n3)
n3.add(n1)

print n1, n2, n3


p1 = dumps(n1)
Node.all_nodes.clear()
p2 = loads(p1)

print p2
print p2.children
print p2.children.pop().children

print Node.all_nodes

The drawback is that it maintains a class dictionary named "all_nodes" where there are references to all actually created nodes. (Pickle is smart enough to only pickle this dictionary once for a given graph, since it is referenced by all Node objects) .
The problem with the class wide "all_nodes" reference is if you need to pickle and unpickle different sets of graphs 9let say you do create graphs g1 with a set of nodes, in another run, create a graph g2 with another set of nodes, and then if you unpickle g1, and later g2, the unpickling of g2 will override the node references for g1). If you need this to work, ask in a comment and I could come up with something - the easiser thing I can think off is having a "graph" class that will hold a dictionary to all the nodes (insteadof having it in the Node class)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文